Organizing over 5000 four-letter words.

Discussion in 'Mac Programming' started by helpmeman, Mar 5, 2011.

  1. helpmeman, Mar 5, 2011
    Last edited: Mar 6, 2011

    helpmeman macrumors newbie

    Joined:
    Feb 20, 2011
    #1
    I am trying to figure out a way to organize a list of over 5000 words in the following manner (Note: AppleScript is the recommended method, however any other method is acceptable):

    - They must be organized such that each word is different from the word before it by exactly one letter. For example, they can go from RATE to RAKE but not from RATE to BONE.

    - Words may not be used more than once.

    - Each change must be in a different position from the three preceding changes except when a word cycle has been completed. For example, if I start with RAKE, it can then go to RAtE, RAts, bAts, bots (lower case letters indicate a letter in that position has already been changed). Notice that BOTS, is completely different from RAKE in that it contains none of the same letters. After a word cycle has been complete, you may change the a letter in the same position as the last one (BOTS can become BITS).

    - The first word must be chosen/entered by the user.

    Attached is the list of words to be organized.
     

    Attached Files:

  2. SidBala macrumors 6502a

    Joined:
    Jun 27, 2010
    #2
    The problem could be modeled as a hamiltonian path problem.

    1) You construct a graph where each node is linked to another node with only one letter changed.

    2) Then find a list of hamiltonian paths. - There are NP complete algorithms that do this.

    3) Now iterate over each of these lists looking for cases where the letters change in the same position. If there is such a case, mark that path as bad and move to the next one.
    Alternatively, you could remove that word from the path and just go to the next word - since you said that you can have words left out.

    Now you should have an ordered list with the constraints you specified.

    The hard part may be getting the list of hamiltonian paths. There exist NP complete algorithms, but I have no idea how long they actually take. They could take a few seconds or literally ages. Maybe a computer scientist has an idea.

    Sounds like an interesting problem. I might actually give this a try just for my own interest. Upload the text file if you can.
     
  3. SidBala macrumors 6502a

    Joined:
    Jun 27, 2010
    #3
    This is part of a game just so you "canz win money" ??

    Man I thought it was some sort of real logic problem. Now I feel like my post was a waste.

    :(
     
  4. helpmeman thread starter macrumors newbie

    Joined:
    Feb 20, 2011
    #4
    Done.
     
  5. ulbador macrumors 68000

    ulbador

    Joined:
    Feb 11, 2010
    #5
    Ha, I found this to be an interesting problem and downloaded the word file as well.

    I have no idea what the source of this data is, but I laughed when I noticed that the last words in the file (obviously added after the fact, probably by the OP), are all "four letter" words describing various body parts and functions.

    Here it is again as the OP seems to like to vandalize his/her own posts.
     

    Attached Files:

  6. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #6
    Sounds a bit like Scrabble Slam! quite a fun card game. (Except for the requirements on different letters and no repeats).

    Won't you need to repeat words in your ordered list in separate chains? Or are you just looking for the longest chain. e.g. RATE -> RAKE or RATE -> RALE or RATE -> FATE or ...

    The file also seems to have been a bit mangled, there are some combined 9 letter words where a comma was replaced with a letter, some 5 letter words and some missing spaces around delimiters.

    B
     
  7. helpmeman, Mar 6, 2011
    Last edited: Mar 6, 2011

    helpmeman thread starter macrumors newbie

    Joined:
    Feb 20, 2011
    #7
    I am looking for the longest possible chain and sorry, I will take a look and fix any typos. EDIT: I believe, I've fixed all of the errors, thanks for bringing that to my attention and let me know if you notice anymore.
     
  8. MasConejos, Mar 6, 2011
    Last edited: Mar 6, 2011

    MasConejos macrumors regular

    MasConejos

    Joined:
    Jun 25, 2007
    Location:
    Houston, TX
    #8
    Question: the way you have this phrased, is it intentional that the changing letter index will repeat? I.e for "RAKE RATE FATE FATS", the changing letter index is 3,1,4,2. If the previous three letter indexes are not to be reused, that means that the next word in the sequence must have the 3rd letter change (e.g. FATS->FANS), which means that the next letter index must be 1, leading to a final pattern of 3142-3142-3142-etc.

    Is this the way you intend, or does the pattern reset after 4 character changes (e.g. 3142-2134-4321)?

    I've currently written this up as an unoptimized (quick and dirty) depth first search in ruby. I am using the repeating pattern as above (WXYZ-WXYZ-WXYZ-etc) and I only use words that fit the pattern (ignore possible solutions that don't match the pattern).

    My best sequence is 2305 words long (so far, still running).

    If you can clarify the rules for word selection, that would be helpful.

    Edit (beware, random comp sci ramblings herein):
    By preferring a pattern-matching solution, rather than a random or pattern-matching-with-exceptions solution, you've essentially increased the difficulty of finding a solution by turning the solution graph into a weighted solution graph as, for two solutions that use all the words, the solution that matches the pattern is preferential to the solution that uses all the words but doesn't always match the pattern (see traveling salesman problem, which is exactly what this has become). Basically you need to do an exhaustive search to find the best solution, which cannot be done in polynomial time; it requires exponential time: basically 5000^5000 solutions need to be computed and compared. If all solutions have equal weight (use all the words, don't care about pattern matching), then something like the depth first search I am using can find a solution (possibly one of many, if any) in less time than an exhaustive search. If there is no solution then an exhaustive search is required. You may want to rethink you're requirements if possible
     
  9. helpmeman thread starter macrumors newbie

    Joined:
    Feb 20, 2011
    #9
    The pattern can reset after 4 character changes as you mentioned. My apologies for the ambiguity.

    Also, if this pattern requirement can for some reason, not be met (due to the single use only rule), it may be ignored but should be followed whenever possible.
     
  10. MasConejos macrumors regular

    MasConejos

    Joined:
    Jun 25, 2007
    Location:
    Houston, TX
    #10
    Solution Posted

    I have posted my solution at http://www.roguepenguin.net/four_letter_words/

    The script is word_chain.rb, you are welcome to modify it for your own needs. I have added some comments, and there is some error checking, but not a whole lot.

    I do not know if it finds the best solution as it self terminates after it stops making progress for a while. I think it does find a "good enough" solution, though.

    In pure matching mode it finds a chain of 2282 words (see pattern_results.txt).
    In best matching mode it finds a chain of 4749 words (see best_results.txt).

    You can look at the "wildchars" section of each result file to see a listing of which character indexes have changed (i.e. 3,1,4,2,3,1,4,2, etc)to get a feel for how closely the result matches the letter pattern you defined.

    The only way to find the absolute best solution is to do an exhaustive search, which this can do if you set the cutoff really high in the script, and you have a LOT of time (best_result.txt took 13 minutes on my imac). The search time could be improved if you made the search multithreaded.

    If you have any question, let me know.
     
  11. helpmeman, Mar 6, 2011
    Last edited: Mar 7, 2011

    helpmeman thread starter macrumors newbie

    Joined:
    Feb 20, 2011
    #11
    THANK YOU. That is awesome:). Is there anyway you can make it so that it asks you for the first word? Also, I was wrong... I need it to not ignore the third requirement even if it cannot be met... and your code will only allow me to use a pattern or no pattern.

    I want all four letter changes to add up to 6 and then restart (i.e. 3+1+0+2, 2+1+3+0. 2+0+1+3, 0+3+1+2, 3+2+0+1 etc.)
     
  12. Mac_Max macrumors 6502

    Joined:
    Mar 8, 2004
    #12
    Here's another submission. This is a very quick and dirty terminal program I wrote in C++. I could possibly add a GUI or directory support if people find some use for it. I used the C++ standard sort function along with a command to convert all letters to upper case so you can enter any comma delineated list and it will be happy.

    To use it open terminal and cd to the directory the text file is located in. Then drag the program into the terminal window and hit enter. Then follow the prompt. The program doesn't have directory support so it'll use whatever the current directory of the terminal is; i.e. if you enter: "cd /Users/<your user name/Desktop" it'll look for the file on the desktop even if the program is on another hard drive and entering /Volumes/SomeVol/text.txt won't work either.

    I added rudimentary error checking but if you try hard you'll probably kill it. Also make sure you pick a file name that's not being used.

    Either way, have fun.
     

    Attached Files:

  13. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #13
    You can examine the code provided and try to figure that one out for yourself.

    B
     
  14. helpmeman thread starter macrumors newbie

    Joined:
    Feb 20, 2011
    #14
    Nay! Such fanciful sorcery is beyond my realm of my knowledge.

    Actually, it's not really my main concern at the moment, I am having difficulty with the "word cycle" requirement. I know I said it could be ignored, but I was wrong. Further, the PURE_PATTERN_MATCHING line will only give me a pattern (3,1,2,4) which I don't want, or disregard it completely (also not what I want).


    Also, Mac Max, I tried yours but all it did was put them in alphabetical order... but maybe I'm doing it wrong.
     
  15. MasConejos macrumors regular

    MasConejos

    Joined:
    Jun 25, 2007
    Location:
    Houston, TX
    #15

    Thats because I wrote it that way before you amended your original post.
    I have posted a new version, word_chain2.rb, that will prompt for the starting word (or you can supply it at the top of the file) and it also will look at all letters every fourth word, possibly starting a new pattern, rather than continuing to follow the original pattern continuously.
     
  16. helpmeman, Mar 7, 2011
    Last edited: Mar 7, 2011

    helpmeman thread starter macrumors newbie

    Joined:
    Feb 20, 2011
    #16
    Gah, I keep getting this:


    I will keep messing with it, but if you know what I'm doing wrong, feel free to let me know ;P

    EDIT: I'm not supposed to be running it in applescript am I.
     
  17. ulbador macrumors 68000

    ulbador

    Joined:
    Feb 11, 2010
    #17
    No... it's a Ruby script.. You need to run it in the Ruby interpreter.... Something tells me you need to go back to the drawing board on this one.
     
  18. MasConejos macrumors regular

    MasConejos

    Joined:
    Jun 25, 2007
    Location:
    Houston, TX
    #18
    You should be able to run it from terminal directly, either double clicking the file or browsing to the file in terminal and typing

    Code:
    ./word_chain2.rb
    or
    Code:
    ruby word_chain2.rb
     
  19. helpmeman thread starter macrumors newbie

    Joined:
    Feb 20, 2011
    #19
    I tried that but hang
     
  20. helpmeman thread starter macrumors newbie

    Joined:
    Feb 20, 2011
    #20
    -bash: ./word_chain2.rb: No such file or directory

    ruby: No such file or directory -- word_chain.rb (LoadError)
     
  21. MasConejos macrumors regular

    MasConejos

    Joined:
    Jun 25, 2007
    Location:
    Houston, TX
    #21
    What OS version are you using, and are you on intel or ppc?
     
  22. helpmeman, Mar 7, 2011
    Last edited: Mar 7, 2011

    helpmeman thread starter macrumors newbie

    Joined:
    Feb 20, 2011
    #22
    Intel

    perhaps it's looking for it in the wrong location? but that doesnt matter...?

    edit: ooooooooooooooooooooooohhhh hang on
    edit 2: nevermind. I noticed it still had your file path in the script so I changed it but same results
     
  23. MasConejos macrumors regular

    MasConejos

    Joined:
    Jun 25, 2007
    Location:
    Houston, TX
    #23
    So you are on OS X 10.5 or 10.6...

    Ruby is supposed to be installed by default on both, I'm not sure why it's not running.

    See if you can find ruby in /usr/bin/ruby or /usr/local/bin/ruby

    What does "echo $PATH" return in terminal?
     
  24. MasConejos macrumors regular

    MasConejos

    Joined:
    Jun 25, 2007
    Location:
    Houston, TX
    #24
    actually, looking at this, you need to navigate terminal to the folder where the word_chain2 file is at, THEN type ./word_chain2.rb

    example:

    Code:
    cd /users/yourusername/Desktop/four_letter_words
    ./word_chain2.rb
    It may take a minute, but if the program is running you should see some output on the screen.
     
  25. helpmeman thread starter macrumors newbie

    Joined:
    Feb 20, 2011
    #25
    /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin

    edit: oh, I got something:

    /Users/JohnM/Desktop/word_chain2.rb: line 2: FILENAME: command not found
    /Users/JohnM/Desktop/word_chain2.rb: line 6: NO_PROGRESS_CUTOFF: command not found
    /Users/JohnM/Desktop/word_chain2.rb: line 10: PURE_PATTERN_MATCHING: command not found
    /Users/JohnM/Desktop/word_chain2.rb: line 14: PROMPT_FIRST_WORD: command not found
    /Users/JohnM/Desktop/word_chain2.rb: line 17: START_WORD: command not found
    /Users/JohnM/Desktop/word_chain2.rb: line 23: syntax error near unexpected token `('
    /Users/JohnM/Desktop/word_chain2.rb: line 23: `def parse_textfile(filename)'
     

Share This Page