Calvin wants to solve anagrams. There are lots of great programs out there, but not much source. Gtoal.com has a good list of source code that is out there, but its hard to find a simple program.
Gtanag.mai is perhaps the simplest program that is a C program. I’m sure you could use it in any language. What it does is pretty brute force. It takes any dictionary like say a Google search for unix dictionary list which gets you to words.txt that has a 25,000 word dictionary or the Unix V7 /usr/dict/words and finds all anagrams by first making each entry canonical that is, it takes all the letters and alpha sorts them, so “rich” becomes “chir” and “calvin” becomes “acilnv” and then it can compare them by sorting all the words. If “acilnv” matches, then you have all the words that are anagrams. That is, use the same letters. Quite clever really.
Then there is one that is quite elegant using primes. Basically instead of the canonical form being the sorted order, it assigns a prime number to each letter. Then when you multiply, that number will be unique since it is a product of primes, if the numbers are the same then the word has to be the same. Wow, people like falbdablet are sure smart!
So the usage if you like Unix (or Mac OS X which is the same is):
canonize < dictionary | sort | gather
The first command, takes an entire text file and turns into the canonical form that looks like “rich=chir” and then sort sorts the canonical alphabetically. Then gather just finds all the identical words. You could also use fgrep to find for instance all the canonical matches.
We worked on a new program “search.c” that opens a dictionary that is canonical and sorted and looks for all anagrams. The Unix V7 English list is 24,000 words, but to find some really great word lists, net-comber.com has a good word list. The main problem is that we really just want an uncluttered list. So outpost9.com has a dictionary and also things that include proper names and abbreviations. The crosswordman.com is what is allowed by Scrabble or simple zip format.
The only small problem is that this is a DOS file format, so you need a small tool like tofrodos to get rid of the CRLF that is in DOS files vs. just the CR that is Unix files.
2 responses so far ↓
1 Ray // Apr 10, 2008 at 16:35
maybe try something from http://www.phrogram.com—coiuld be just perfect for your needs
2 Rich // Apr 17, 2008 at 21:34
http://phrogram.com is great. Calvin also loves http://alice.org. Amazingly, he had no problem writing generic C. Its like a 30 year round. 30 years ago I learned C myself and now he is learning it. Kinda cool!
You must log in to post a comment.