From msuinfo!agate!howland.reston.ans.net!europa.eng.gtefsd.com!usenet Fri Sep 17 20:13:20 1993
Path: msuinfo!agate!howland.reston.ans.net!europa.eng.gtefsd.com!usenet
From: Sig@Seuss.Vantage.GTE.COM
Newsgroups: comp.ai,comp.ai.nat-lang,comp.compression,comp.compression.research,sci.crypt
Subject: American language standardized dictionary for text compression
Date: 17 Sep 1993 13:58:26 GMT
Organization: GTE
Lines: 62
Message-ID: <27cfq2$nhj@europa.eng.gtefsd.com>
NNTP-Posting-Host: seuss.vantage.gte.com
Xref: msuinfo comp.ai:18911 comp.ai.nat-lang:706 comp.compression:8845 comp.compression.research:1087 sci.crypt:19365

As an aid to those involved in natural language parsing, dictionary compression,
or textual encryption, I have been collecting and compiling a lengthy list of
words.  It is expected that a comprehensive standardized dictionary will
eventually result.  This dictionary should contain most common American words,
abbreviations, hyphenations, and even incorrect spellings.

An anonymous ftp server has been built on wocket.vantage.gte.com which contains the following files in the pub/standard_dictionary directory:

-r--r--r--  1 ftp      ftp       1269760 Aug 16 08:36 dic-0893.tar
-r--r--r--  1 ftp      ftp        523393 Aug 16 08:43 dic-0893.tar.Z
-r--r--r--  1 ftp      ftp        421239 Aug 16 08:39 dic-0893.zip
-r--r--r--  1 ftp      ftp       3186688 Sep 17 08:26 dic-0993.tar
-r--r--r--  1 ftp      ftp       1503561 Sep 17 09:27 dic-0993.tar.Z
-r--r--r--  1 ftp      ftp          3052 Sep 17 08:26 length02.txt
-r--r--r--  1 ftp      ftp         37805 Sep 17 08:26 length03.txt
-r--r--r--  1 ftp      ftp         99996 Sep 17 08:26 length04.txt
-r--r--r--  1 ftp      ftp        212723 Sep 17 08:26 length05.txt
-r--r--r--  1 ftp      ftp        361496 Sep 17 08:26 length06.txt
-r--r--r--  1 ftp      ftp        456741 Sep 17 08:26 length07.txt
-r--r--r--  1 ftp      ftp        609880 Sep 17 08:26 length08.txt
-r--r--r--  1 ftp      ftp        388586 Sep 17 08:26 length09.txt
-r--r--r--  1 ftp      ftp        305936 Sep 17 08:26 length10.txt
-r--r--r--  1 ftp      ftp        228787 Sep 17 08:26 length11.txt
-r--r--r--  1 ftp      ftp        170744 Sep 17 08:26 length12.txt
-r--r--r--  1 ftp      ftp        108060 Sep 17 08:26 length13.txt
-r--r--r--  1 ftp      ftp         70864 Sep 17 08:26 length14.txt
-r--r--r--  1 ftp      ftp         43384 Sep 17 08:26 length15.txt
-r--r--r--  1 ftp      ftp         26478 Sep 17 08:26 length16.txt
-r--r--r--  1 ftp      ftp         14953 Sep 17 08:26 length17.txt
-r--r--r--  1 ftp      ftp          7980 Sep 17 08:26 length18.txt
-r--r--r--  1 ftp      ftp          5397 Sep 17 08:26 length19.txt
-r--r--r--  1 ftp      ftp          2948 Sep 17 08:26 length20.txt
-r--r--r--  1 ftp      ftp          1978 Sep 17 08:26 length21.txt
-r--r--r--  1 ftp      ftp          1440 Sep 17 08:26 length22.txt
-r--r--r--  1 ftp      ftp           825 Sep 17 08:26 length23.txt
-r--r--r--  1 ftp      ftp           650 Sep 17 08:26 length24.txt
-r--r--r--  1 ftp      ftp           297 Sep 17 08:26 length25.txt
-r--r--r--  1 ftp      ftp           140 Sep 17 08:26 length26.txt
-r--r--r--  1 ftp      ftp           116 Sep 17 08:26 length27.txt
-r--r--r--  1 ftp      ftp            30 Sep 17 08:26 length28.txt
-r--r--r--  1 ftp      ftp             0 Sep 17 08:26 length29.txt
-r--r--r--  1 ftp      ftp             0 Sep 17 08:26 length30.txt
-r--r--r--  1 ftp      ftp             0 Sep 17 08:26 length31.txt
-r--r--r--  1 ftp      ftp            34 Sep 17 08:26 length32.txt
-r--r--r--  1 ftp      ftp         11521 Aug 13 16:35 tarread.com

The most recent compilation being dic-0993.tar is composed of the 31 text files
and may be restored on an MS-DOS computer using the tarread.com utility program.

Any words for inclusion in future dictionaries should be submitted to my E-Mail
address directly or placed in the /pub/incoming directory.  Please compare your
dictionaries with standard Unix 'words' and submit only the differences.  Many thanks to those that have submitted the 200,000 words during the last month.

Take care.

         - Sig

Sigurd P. Crossland
Advanced Technology Lab                   Telephone: (703) 818-8504
GTE                                       Facsimile: (703) 802-3110
15000 Conference Center Drive             Internet: sig@seuss.vantage.gte.com
Chantilly, VA   22021                     Home: (703) 818-8942

