Compiling a Huge list.

Discussion in 'Mac Basics and Help' started by Eazkk123, Mar 14, 2007.

  1. Eazkk123, Mar 14, 2007
    Last edited by a moderator: Jan 14, 2011

    Eazkk123 macrumors member

    Joined:
    Jan 17, 2007
    #1
    I have a list of 40,000 words with the following endings .x12 .y15 an example talktous.x12 i want to remove all the names with the end .x12 and than i want to arrange all the .y15 in order of number of characters from 1 character to the maximum which i have not counted.

    Help appreciated.

    Eazkk123
     
  2. GimmeSlack12 macrumors 603

    GimmeSlack12

    Joined:
    Apr 29, 2005
    Location:
    San Francisco
    #2
    40,000 words right? Not files or icons or anything right?

    If this is the case you could probably use Excel. Otherwise I'm gonna see what I can figure out using Automator.
     
  3. Eazkk123 thread starter macrumors member

    Joined:
    Jan 17, 2007
    #3
    Just words! i have excel but because it is such a big list it has difficulties importing it, and many errors occur (typical of microsoft :) )if Automator or Textmate can do it that would be great.

     
  4. Angrist macrumors 6502

    Joined:
    Mar 11, 2005
    Location:
    MI or NJ
    #4
    You could write some C-code or a shell script to edit the list if it's just a text file.
     
  5. Eazkk123, Mar 15, 2007
    Last edited by a moderator: Jan 14, 2011

    Eazkk123 thread starter macrumors member

    Joined:
    Jan 17, 2007
    #5
    How do you do that?!

    Eazkk123


     
  6. RedTomato macrumors 68040

    RedTomato

    Joined:
    Mar 4, 2005
    Location:
    .. London ..
    #6
    Hmm. I would have thought Excel could deal with it.

    You need a database program, which are kind of programs that are specially designed to deal with very long lists.

    FileMaker Pro could probably do it. I've created test databases with 200,000 files, each with 10 fields, in it and done global sorts, no problem.

    You don't need the latest version, something a few years old will do fine. It should be able to import your file CSV or Excel or whatever.

    shell scripts (or bash scripts) in AWK could do it too, check out

    http://en.wikipedia.org/wiki/AWK_programming_language

    It might be a little technical for you.

    Hope this helps.

    What do you need this for anyway?
     
  7. Eazkk123, Mar 15, 2007
    Last edited by a moderator: Jan 14, 2011

    Eazkk123 thread starter macrumors member

    Joined:
    Jan 17, 2007
    #7
    It is a massive .txt file, and excel does import the list but there are problems such as the extensions to the words .x12 goes to the second column, but some stay in the first column so it is all scattered not one straight list, basically i just need help removing all the words with the ending .x12 and than all the endings with .y12 i want to arrange from least number of characters to the most.

    Help Appreciated

    Eazkk123


     
  8. GimmeSlack12 macrumors 603

    GimmeSlack12

    Joined:
    Apr 29, 2005
    Location:
    San Francisco
    #8
    Well I've had no luck with Automator (and I was only trying a list of 20).

    Though BBEdit has some features for removing text lines. Still not sure what syntax would be used to organize your list by short to long words. BBEdit is shareware so you can give it a try.
     
  9. mkrishnan Moderator emeritus

    mkrishnan

    Joined:
    Jan 9, 2004
    Location:
    Grand Rapids, MI, USA
    #9
    I don't think the problems you are having with Excel have anything to do with the size of the list. 40k entries should be well within its limitations. You are having problems because you are not setting the text import parameters correctly. Go back through it manually and make sure that the number of fields and the delimitations are correct.
     
  10. RedTomato macrumors 68040

    RedTomato

    Joined:
    Mar 4, 2005
    Location:
    .. London ..
    #10
    I agree. Also worth scrolling through your text file and checking by eye that all the data is clean. Many times I've found that what i thought was a clean data file had 2 items on the same line or some other small screw-up.

    You could try just importing 100 into Excel, check if that works (clean data, correct import parameters, sorted properly etc), then 1000, then 2000, then 5000 words.

    Or split your text file into blocks of 5000 words.

    You still havn't said what all this is for. considering all the help we're giving, we'd like to know :rolleyes:
     
  11. MacBoobsPro, Mar 15, 2007
    Last edited by a moderator: Jan 14, 2011

    MacBoobsPro macrumors 603

    MacBoobsPro

    Joined:
    Jan 10, 2006
    #11
    What about find and replace in TextEdit? Simple enough. :confused:

    TextEdit

    Edit > Find > Find...

    Fill in the bits you want to take out in the 'find' box and leave the 'replace with' empty.

    Thats it.
     
  12. CanadaRAM macrumors G5

    CanadaRAM

    Joined:
    Oct 11, 2004
    Location:
    On the Left Coast - Victoria BC Canada
    #12
    Use TextWrangler to find and replace ".x" and ".y" with "\tx" (tab x) and "\ty" to force the extension into a different column, but to ignore any other periods that don't have a following X or Y.
    Then import into Excel
    Then use Excel's sorting functions to correct errors (errors show up grouped nicely together when you sort cleverly)
    Then write some calculations to make values in other columns for word length etc, and a concatenation calculation to put the word back together and put the period back in.
     
  13. ChrisA, Mar 15, 2007
    Last edited by a moderator: Jan 14, 2011

    ChrisA macrumors G4

    Joined:
    Jan 5, 2006
    Location:
    Redondo Beach, California
    #13
    Open up a terminal window. Use "grep" to remove all the X12 words (or keep all the Y15 words) Next to sort, this will be harder. You will have to add a second colum to the data that is the character count and then sort on the count and then delete the column . I'd use a short perl script to add the column but you could do this in excel
     
  14. Eazkk123 thread starter macrumors member

    Joined:
    Jan 17, 2007
    #14
    I want to remove the WHOLE word with the extension .x12 and every word is different so i can't just use find. Importing it into excel, can someone help with the parameters?! It is very confusing.

     

Share This Page