Anagram Solver

Calvin wants to solve anagrams. There are lots of great programs out there, but not much source. “Gtoal.com”:http://www.gtoal.com/wordgames/anagrams.html has a good list of source code that is out there, but its hard to find a simple program.

“Gtanag.mai”:http://www.gtoal.com/wordgames/anagrams/gtanag.mai is perhaps the simplest program that is a C program. I’m sure you could use it in any language. What it does is pretty brute force. It takes any dictionary like say a Google search for “unix dictionary list”:google which gets you to “words.txt”:http://linkage.rockefeller.edu/chaynes/words.txt that has a 25,000 word dictionary or the Unix V7 “/usr/dict/words”:http://unix-tree.huihoo.org/V7/usr/dict/words.html and finds all anagrams by first making each entry canonical that is, it takes all the letters and alpha sorts them, so “rich” becomes “chir” and “calvin” becomes “acilnv” and then it can compare them by sorting all the words. If “acilnv” matches, then you have all the words that are anagrams. That is, use the same letters. Quite clever really.

Then there is one that is quite elegant using primes. Basically instead of the canonical form being the sorted order, it assigns a prime number to each letter. Then when you multiply, that number will be unique since it is a product of primes, if the numbers are the same then the word has to be the same. Wow, people like “falbdablet”:http://www.gtoal.com/wordgames/anagrams/flabdablet/anagram.c are sure smart!

So the usage if you like Unix (or Mac OS X which is the same is):

bq. canonize < dictionary | sort | gather The first command, takes an entire text file and turns into the canonical form that looks like "rich=chir" and then sort sorts the canonical alphabetically. Then gather just finds all the identical words. You could also use fgrep to find for instance all the canonical matches. We worked on a new program "search.c" that opens a dictionary that is canonical and sorted and looks for all anagrams. The Unix V7 English list is 24,000 words, but to find some really great word lists, "net-comber.com":http://www.net-comber.com/wordurls.html has a good word list. The main problem is that we really just want an uncluttered list. So "outpost9.com":http://www.outpost9.com/files/WordLists.html has a dictionary and also things that include proper names and abbreviations. The "crosswordman.com":http://www.crosswordman.com/wordlist.html is what is allowed by Scrabble or simple "zip":http://personal.riverusers.com/~thegrendel/ format. The only small problem is that this is a DOS file format, so you need a small tool like "tofrodos":http://www.thefreecountry.com/tofrodos/ to get rid of the CRLF that is in DOS files vs. just the CR that is Unix files.

2 Replies to “Anagram Solver”

Comments are closed.