Need help altering a .txt file....

Discussion in 'Mac Programming' started by IndianMojo, Mar 19, 2008.

  1. macrumors newbie

    #1
    Hello all...

    I have an issue that I hope someone may help me resolve. If this is not the correct forum for me to be posting to, please point me in the direction I need to go.

    I am working for a company that produces telephone directories. We purchase listings from other companies and then reformat them to work for us in our book. The problem I am having is that one company has sent listings that are formatted using what appears to be a proprietary coding. T

    What I need help with is deleting the useless characters in a string while leaving the listing intact.

    Here is an example of what the file looks like:

    S01016100532473#######1020906080213 918
    S1301610 R RES
    S140161001ILN 0 LAST NAME, FIRST NAME 0
    S140161002ILA 0ADDRESS 0

    I have formatted in bold the information that I need to keep. All of the rest can disappear.

    Is it even a possibility to write a script that would do what I need? And again, Iif I am in the wrong forum, please let me know where I need to go.

    BTW, I am totally clueless about writing scripts or code, so please use layman's terms, if possible.

    Thank you in advance......
     
  2. macrumors 6502a

    #2
    How much data are talking about? A city the size of New York or a town the size of Lukenbach, Texas?

    While you show an example, I'm going to assume there is more detail you are not telling us, like the useless information will vary, and the bold items might or might not be in fixed locations?

    With all the details, I could write this reformatter, as I would also imagine this is worth something to your company to get done.

    Feel free to PM me.

    Todd
     
  3. macrumors newbie

    #3
    Todd,

    Thank you for your response. I will PM you with a specific sample, but am responding here to inform everyone who may read this about the issue.

    There are roughly 2000 listings, and most are formatted exactly as I gave in the example. The info before and after the telephone number are a specific number of characters and do not vary. The same is true with the residential, name and address lines.

    The only variance is when a business or individual has multiple numbers listed and this would look something like this:

    S01005920532473#######910926080213 918
    S1300592 R RES
    S140059201ICAP 0LAST NAME, FIRST NAME 0
    S140059202ILN 1(OLN) 0
    S140059203ILA 1,ADDRESS 0
    S140059204ITN 1###-###-#### 0
    S140059205IAL 12ND LINE 0
    S140059206ILA 1(OAD) 0
    S140059207ITN 1###-###-#### 0

    Again, needed info is in bold.

    As you can see this is different than the previous example. These extra line listings are a rarity, however. I could extract these manually and be left with a uniform file to run a script on.

    Thanks again in advance for any help you may provide.
     
  4. macrumors 6502a

    #4
    Great. What's the output format need to look like?

    Todd
     
  5. macrumors regular

    needlnerdz

    #5
    isn't this what interns are for?
     
  6. macrumors newbie

    #6
    Use a perl script

    I would write a perlscript for this if I were you. Perl has great regular experssions methods. Pick up a book on perl or just search the web for a perl tutorial.
     
  7. macrumors 68040

    motulist

    #7
    Should be possible with even a simple automator workflow as long as the formatting is consistently the way you wrote in your sample. However, for the phone number I don't see what delineates the phone number from the garbage numbers. Are the garbage numbers before or after the phone number always the same or something? Maybe they're always the same amount of garbage characters before the real phone number?

    To make it easier to figure out a way to help you, why don't you post your previous sample followed by a new second sample so we can see what stays constant and what changes in each entry.
     
  8. macrumors 6502a

    #8
    I'm guessing that by "disappear", you are merely wanting the info shifted left, like this?

    Code:
    #######
    R
    LAST NAME, FIRST NAME 
    ADDRESS 
    
    Todd
     
  9. macrumors 65816

    Flynnstone

    #9
    sed, awk ...
    Perl
    For 2000 listings ... Perl or might be able to just use a text editor with search and replace. Then tidy the file up for exceptions.
     
  10. macrumors newbie

    #10
    Thank you everybody for your responses. I am going to answer some of the posts.

    I can't use a find and replace, because the lines of code change for each entry.

    To me...a perl is a small white, sometimes black, thing that costs lots of money and are made into necklaces and earrings....however, I am willing to learn, and will definitely look into buying a book on the subject.

    The telephone number is always the same number of characters in on the line starting with S01. The last three characters before the telephone number are 473 always. The number of characters after the phone number are also always constant.

    And finally, yes this is what interns are for...unfortunately we are too small a company to have interns....so the select, delete, select, delete, tab, select, delete, tab, job would fall to me......(I feel like such an intern).

    which reminds me....it would be helpful if this were tab delimited to fit into Excel easily.

    Thanks, everyone.
     
  11. macrumors 65816

    Flynnstone

    #11
    If you do this every so often , I recommend perl (not the necklace).
    I still think you should be able to do it with a text editor like Textwrangler.
    You will need to learn and use "regular expressions".
     
  12. macrumors 68040

    motulist

    #12
    An automator script would work perfectly in that case. Create a blank automator script and do the following:

    add 'launch application: textedit'
    record user's action that selects and deletes the garbage characters. For instance:
    -hold down shift and hit the right arrow until all the garbage characters are selected
    -hit delete
    -hit the right arrow 7 times to get to the end of the phone number
    -hold down shift and hit the goto end of line key command
    -hit return to delete that garbage and go to the next line

    etc.

    That makes it sound more complex than it is. It'll take some tweaking to get it just right, but it won't be too hard.

    EDIT:

    No matter what solution you attempt, make sure you build your solution using a sample file that only has like 5 entries in it first before applying the finished solution to your actual entire big list.
     
  13. macrumors newbie

    #13
  14. macrumors 68040

    motulist

    #14
    Learning a whole scripting language just to do this one particular pretty simple task seems like a very inefficient solution when it could be done much quicker and more easily using automator.
     

Share This Page