From ptownson@massis.lcs.mit.edu Wed Mar 19 09:10:57 1997 Return-Path: Received: by massis.lcs.mit.edu (8.7.4/NSCS-1.0S) id JAA04371; Wed, 19 Mar 1997 09:10:57 -0500 (EST) Date: Wed, 19 Mar 1997 09:10:57 -0500 (EST) From: ptownson@massis.lcs.mit.edu (TELECOM Digest Editor) Message-Id: <199703191410.JAA04371@massis.lcs.mit.edu> To: ptownson Subject: Word/Number Program Here is an interesting file which came my way that I thought the rest of you would enjoy as well. Play with it awhile and see what interesting combinations you can create. PAT Subject: Number/Word Program Date: Tue, 18 Mar 97 12:52:09 PDT From: rweller@h-e.com Organization: Hammett & Edison, Inc. Pat, I am re-sending separately the number/word program in binhex form. In case you have problems with it, I am sending the source code in two parts as plain text. The first part appears below. The second part is appended to it in this version going out to the Digest. Regards, Bob >>>>>>>>> /*------------------------------------------------------------------- number2word.c Program to find word equivalents to telephone numbers, or convert letter strings to numbers. In number-to-word mode, a search is made for valid words based on a list read from a dictionary file. Optionally, the word search may be turned off to print out all possible letter iterations. Command-line options: -r Reverse mode, convert letter strings to numbers (numbers to words is the default mode). -a In number to word mode, suppress the dictionary check for valid words and print out all possible letter combinations. -d file Use the named file as the word dictionary, it must contain a list of words one per line in alphabetical order. If not specified, "words" in the current directory is the default. -s Rudimentary pluralization of words, if the last letter of a word being checked is 's' and there is no match for the whole word, drop the 's' and try again. Normally the word must match a dictionary word exactly. -q Include 'q' and 'z' in the conversions, with 'q' on the 7 key and 'z' on the 9. Normally these letters are not used when converting numbers to words, and are illegal when converting strings to numbers. -0 0-o and 1-i mode. With this option, 'o' is removed from its normal position on the 6 and made equivalent to 0, similarily 'i' is moved from the 4 to the 1. Without this, 0 and 1 are illegal when converting numbers to words, and are not used when converting strings to numbers. Following any options on the command line are numbers or strings to be converted. Number arguments may contain letters in number to word mode, and strings may contain numbers in reverse mode. In the first case, any letters will first be converted back to numbers, so all iterations of other letters on the same number key will be checked. In the second case, the numbers are simply passed through unchanged. In either mode, the arguments may contain separator characters, anything that is not [0-9A-Za-z] is a separator. Separators are passed through unchanged, but in either mode a separator cannot be the first or last character of an argument, and no two separators may be sequential within the argument. As indicated above, arguments for number to word conversion must not contain digits 0 or 1 unless the option "-0" appears, and arguments for reverse conversion must not contain 'q' or 'z' unless "-q" appears. In number to word mode, the arguments must not contain more than a maximum number of digits, not counting separators. The limit is defined below as MAXNUM. The number of letter iterations to check will increase geometrically with the number of digits, so setting MAXNUM very large might make this take a very long time. 10 is the recommended value. In number to word mode, separators are viewed as breaking the number into sub-strings. During the word search, two approaches are used. First a search is made for a word match to the entire number, ignoring separators, then a search is made for words that match each sub-string. For example, "223-6636" will yield both "abdomen" and "bad-omen". "Release notes": The word search is only as good as the dictionary file, a very basic one with words up to 10 characters ought to arrive along with this source file, but it can undoubtedly be improved. This is left as an exercise for the interested student. This was written for a Unix environment, and tested under a couple of flavors (ULTRIX 3.1, Linux 2.0.28), but that's as far as it goes. I did this as a favor for a friend and not primarily for distribution, and I don't have the time to deal with support for other platforms. You're welcome to do whatever you want with it to make it work on your favorite OS/machine, but frankly I don't want to hear about it, hence the lack of anything here like my name/e-mail address. Good luck! */ /*-------------------------------------------------------------------- Header files. */ #include #include #include /*-------------------------------------------------------------------- Constants and parameters. */ #define MAXNUM 10 /* Max number of digits in a number for word conversion. */ #define DICTNAME "words" /* Default dictionary file name. */ #define WORDALLOC 2500 /* Allocation increment for word storage. */ /*-------------------------------------------------------------------- Function prototypes. */ void loadwords(FILE *, int *); void tonumbers(char *); void toletters(char *, int); void findwords(char *); int checkword(char *, int); /*------------------------------------------------------------------- Globals. */ struct { /* Dictionary word lists, one for each word length. */ int nword[26]; /* Number of words for each first letter. */ char *index[26]; /* Pointers to first word for each letter. */ char *block; /* Word storage. */ } Words[MAXNUM]; int Mode = 1; /* True for numbers to words, false for letters to numbers. */ int Nosearch = 0; /* True to suppress word search and show all iterations. */ int Plurals = 0; /* True to allow 's' at the end of any matching word. */ int Qmode = 0; /* True to include 'q'<->7 and 'z'<->9. */ int Zeromode = 0; /* True to make 'o'<->0 and 'i'<->1. */ int Nsub; /* Substring info - number, length, and start of all */ int Slen[MAXNUM]; /* substrings in a number being converted to words, */ char *Subs[MAXNUM]; /* global to avoid duplication in recursion. */ /*-------------------------------------------------------------------- Main routine. */ main( int argc, char **argv) { FILE *dict; int n, i, s, iopt, istart, subl, lenflag[MAXNUM]; char *p, *dname, str[(2 * MAXNUM) + 1]; /* Set the default dictionary file name. */ dname = DICTNAME; /* Begin parsing command line, look for options. This supports "-a -b -c" or "-abc" forms, or any mixture. */ iopt = 1; while (iopt < argc) { p = argv[iopt]; if (*p != '-') { break; } for (p++; *p; p++) { switch (*p) { case 'r': /* Switch to letter to number mode. */ Mode = 0; break; case 'a': /* Turn off word-search mode. */ Nosearch = 1; break; case 'd': /* Next arg has name for dictionary file. */ if (++iopt >= argc) { break; } dname = argv[iopt]; break; case 's': /* Set flag to check for pluralized words. */ Plurals = 1; break; case 'q': /* Include 'q' and 'z'. */ Qmode = 1; break; case '0': /* Make 0 be 'o' and 1 be 'i'. */ Zeromode = 1; break; default: /* What? */ iopt = argc; break; } if (iopt >= argc) { break; } } iopt++; } /* Done with options, there must be some more arguments to process. Also iopt >= argc if an error occurred above. */ if (iopt >= argc) { fprintf(stderr, "usage: %s [-radsq0] arg ...\n", argv[0]); fputs(" -r convert strings to numbers\n", stderr); fputs(" -a print all letter iterations\n", stderr); fputs(" -d use next arg as name of dictionary file\n", stderr); fputs(" -s append 's' to every word in dictionary\n", stderr); fputs(" -q include 'q' <-> 7 and 'z' <-> 9\n", stderr); fputs(" -0 make 'o' <-> 0 and 'i' <-> 1\n", stderr); exit(1); } /* Scan the arguments and check for errors. In number to word mode, check the number of digits against the max, check for 0 and 1 unless Zeromode is true, also track the possible sub-string lengths so that words of all required lengths can be loaded into the dictionary later. In letter to number mode, check for 'q' and 'z' unless Qmode is true. In either mode, check for a separator as the first/last character or two separators in a row. Also, all the arguments get converted to lower-case in place. */ istart = iopt; for (n = 0; n < MAXNUM; n++) { lenflag[n] = 0; } for (; iopt < argc; iopt++) { n = 0; subl = 0; s = -1; for (p = argv[iopt]; *p; p++) { *p = tolower(*p); if (isalnum(*p)) { s = 0; if (Mode) { if (!Zeromode) { if ((*p == '0') || (*p == '1')) { fprintf(stderr, "%s: in '%s': '0' or '1' not allowed\n", argv[0], argv[iopt]); exit(1); } } n++; subl++; } else { if (!Qmode) { if ((*p == 'q') || (*p == 'z')) { fprintf(stderr, "%s: in '%s': 'q' or 'z' not allowed\n", argv[0], argv[iopt]); exit(1); } } } } else { if (s) { if (s < 0) { fprintf(stderr, "%s: in '%s': illegal leading separator\n", argv[0], argv[iopt]); } else { fprintf(stderr, "%s: in '%s': illegal repeated separator\n", argv[0], argv[iopt]); } exit(1); } s = 1; if (Mode) { lenflag[subl - 1] = 1; if (Plurals && (subl > 1)) { lenflag[subl - 2] = 1; } subl = 0; } } } if (s) { fprintf(stderr, "%s: in '%s': illegal trailing separator\n", argv[0], argv[iopt]); exit(1); } if (Mode) { if ((n < 1) || (n > MAXNUM)) { fprintf(stderr, "%s: in '%s': illegal digit count\n", argv[0], argv[iopt]); exit(1); } lenflag[subl - 1] = 1; if (Plurals && (subl > 1)) { lenflag[subl - 2] = 1; } lenflag[n - 1] = 1; if (Plurals && (n > 1)) { lenflag[n - 2] = 1; } } } /* Load the dictionary file, if needed. Only words of lengths equal to possible word lengths in the actual arguments are loaded. */ if (Mode && !Nosearch) { if ((dict = fopen(dname, "r")) == NULL) { fprintf(stderr, "%s: unable to open dictionary file '%s'\n", argv[0], dname); exit(1); } loadwords(dict, lenflag); fclose(dict); } /* Main loop over arguments. First make a copy of the argument. In number to word mode, run the arg through tonumbers() in case it contains any letters, then print it out. Then build the substring locations in globals so this doesn't need to be done repeatedly on every letter iteration (the separators are not going to move around or change). Finally call toletters(), which recursively handles number to letter conversions and calls the routine findwords() at the bottom to search for word matches. In letter to number mode, simply echo the argument, call tonumbers(), and print the result. */ for (iopt = istart; iopt < argc; iopt++) { strcpy(str, argv[iopt]); if (Mode) { tonumbers(str); putchar('\n'); puts(str); Nsub = 0; Slen[0] = 0; Subs[0] = str; for (p = str; *p; p++) { if (isdigit(*p)) { Slen[Nsub]++; } else { Slen[++Nsub] = 0; Subs[Nsub] = p + 1; } } Nsub++; toletters(str, 0); } else { putchar('\n'); puts(str); tonumbers(str); puts(str); } } putchar('\n'); exit(0); } /*--------------------------------------------------------------------- Routine reads in the word dictionary. */ void loadwords( FILE *in, /* Input stream. */ int *lenflg) /* Flags indicate which word lengths to load. */ { int i, n, ilet[MAXNUM], count[MAXNUM], max[MAXNUM]; char *p, word[50], letter[MAXNUM], *nextword[MAXNUM]; /* Initialize, zero out all the word lists. */ for (n = 0; n < MAXNUM; n++) { letter[n] = 'a' - (char)1; ilet[n] = -1; count[n] = 0; max[n] = 0; Words[n].block = NULL; for (i = 0; i < 26; i++) { Words[n].nword[i] = 0; Words[n].index[i] = NULL; } } /* Read and store words. A word with any non-letter in it is ignored, as are words more than MAXNUM letters long. All words are converted to lower- case also. This assumes the input list is already in alphabetical order. The words are stored without null termination, since the length of all the words in one structure is the same. Allocation is in chunks of WORDALLOC words for all lengths. */ while (fgets(word, 50, in)) { n = 0; for (p = word; *p; p++) { if (*p == '\n') { *p = '\0'; break; } if (!isalpha(*p)) { n = 0; break; } *p = tolower(*p); n++; } if ((n < 1) || (n > MAXNUM)) { continue; } if (!lenflg[--n]) { continue; } if (max[n] == 0) { Words[n].block = (char *)malloc(WORDALLOC * (n + 1)); nextword[n] = Words[n].block; max[n] = WORDALLOC; } else { if (count[n] == max[n]) { Words[n].block = (char *)realloc(Words[n].block, (max[n] + WORDALLOC) * (n + 1)); nextword[n] = Words[n].block + (max[n] * (n + 1)); max[n] += WORDALLOC; } } if (*word != letter[n]) { if (*word < letter[n]) { fputs("**dictionary file is not sorted, exiting\n", stderr); exit(1); } ilet[n] += (int)(*word - letter[n]); letter[n] = *word; } strncpy(nextword[n], word, (n + 1)); Words[n].nword[ilet[n]]++; count[n]++; nextword[n] += n + 1; } /* Compute all the word index pointers in the structures, these couldn't be done on the fly because they all break if realloc() moves the block. */ for (n = 0; n < MAXNUM; n++) { if (count[n] == 0) { continue; } p = Words[n].block; for (i = 0; i < 26; i++) { if (Words[n].nword[i] == 0) { continue; } Words[n].index[i] = p; p += Words[n].nword[i] * (n + 1); } } } /*------------------------------------------------------------------------ Routine converts letters to numbers in string. */ void tonumbers( char *str) /* String to convert. */ { char *p; /* Do it. */ for (p = str; *p; p++) { switch (*p) { case 'a': case 'b': case 'c': *p = '2'; break; case 'd': case 'e': case 'f': *p = '3'; break; case 'g': case 'h': *p = '4'; break; case 'i': if (Zeromode) { *p = '1'; } else { *p = '4'; } break; case 'j': case 'k': case 'l': *p = '5'; break; case 'm': case 'n': *p = '6'; break; case 'o': if (Zeromode) { *p = '0'; } else { *p = '6'; } break; case 'p': case 'q': case 'r': case 's': *p = '7'; break; case 't': case 'u': case 'v': *p = '8'; break; case 'w': case 'x': case 'y': case 'z': *p = '9'; break; } } } /*--------------------------------------------------------------------- Routine to convert numbers to letters. This is recursive, the main call is for the first digit, each recursive call for a subsequent digit. Digits are replaced by letters in place in the passed string. At the deepest level, findwords() is called to check the converted string for words. Each digit is restored for the next recursive pass. */ void toletters( char *str, /* String being converted. */ int p) /* Digit to work on. */ { char *l, savenum; int q; /* Skip past a separator, earlier checks make it safe to assume there are never two separators in a row nor a separator at the very end. */ if (!isalnum(str[p])) { p++; } /* Set possible letter string for the digit. If '0' or '1' appear it is save to assume Zeromode is true due to earlier checks. */ switch (str[p]) { case '0': l = "o"; break; case '1': l = "i"; break; case '2': l = "abc"; break; case '3': l = "def"; break; case '4': if (Zeromode) { l = "gh"; } else { l = "ghi"; } break; case '5': l = "jkl"; break; case '6': if (Zeromode) { l = "mn"; } else { l = "mno"; } break; case '7': if (Qmode) { l = "pqrs"; } else { l = "prs"; } break; case '8': l = "tuv"; break; case '9': if (Qmode) { l = "wxyz"; } else { l = "wxy"; } break; } /* Loop over letters, recurse. If at the bottom, if Nosearch is true just print out the string, otherwise call findwords(), it will print out the string if a match is found. */ q = p + 1; savenum = str[p]; while (*l) { str[p] = *l; if (str[q]) { toletters(str, q); } else { if (Nosearch) { puts(str); } else { findwords(str); } } l++; } str[p] = savenum; } /*---------------------------------------------------------------------- Routine to take a number converted to a letter string and search it for words that match the dictionary list. */ void findwords( char *str) /* Converted string to search for words. */ { int i, wlen; char *p, whole[MAXNUM + 1]; /* Make a copy of the string with separators stripped and check for a whole string word match. Skip this if there is only one substring, in that case the substring is the whole string and this is redundant. */ if (Nsub > 1) { wlen = 0; for (i = 0; i < Nsub; i++) { strncpy((whole + wlen), Subs[i], Slen[i]); wlen += Slen[i]; } whole[wlen] = '\0'; if (checkword(whole, wlen)) { puts(whole); } } /* Check for substring matches, all must match for success. */ for (i = 0; i < Nsub; i++) { if (!checkword(Subs[i], Slen[i])) { return; } } puts(str); } /*---------------------------------------------------------------------- Routine to compare a string of known length to dictionary, return true or false for match. */ int checkword( char *w, /* Word string, is *not* null terminated. */ int n) /* Length. */ { int i, j, l; char *c; /* Do it. */ j = (int)(*w - 'a'); if ((l = Words[n - 1].nword[j]) > 0) { for (i = 0, c = Words[n - 1].index[j]; i < l; i++, c += n) { if (strncmp(w, c, n) == 0) { return(1); } } } /* No match, if Plurals flag is true and the last letter is an 's', check again for words one letter shorter. */ if (Plurals && (n > 1) && (w[n - 1] == 's')) { n--; if ((l = Words[n - 1].nword[j]) > 0) { for (i = 0, c = Words[n - 1].index[j]; i < l; i++, c += n) { if (strncmp(w, c, n) == 0) { return(1); } } } } return(0); } --------- cut here -----------