PDA

View Full Version : Search and replace




mysterytramp
Oct 22, 2009, 10:08 PM
This should be an easy project but one that's eaten up way too much time.

I have about 50 database files (text), between 750 and 850K, where I need to replace the pipe character ("|") for a tab so the files can be easily imported into Excel or any other spreadsheet.

I was willing to let Automator do the job, even if it would take loads longer, but Automator looks really deficient in the text manipulation department.

It probably looks like a job for sed or perl, but the quickie tutorials I could find left out how to handle things like the tab character. So I'm doing the weasel thing in asking for help for something I should be able to do myself.

Can you help a guy out?

thanks

mt



HiRez
Oct 22, 2009, 10:52 PM
I would just use the free TextWrangler (http://www.barebones.com/products/textwrangler/download.html), which has a multifile search and replace function. Using TextWrangler 3.0 (the menus are different in previous versions), select Multi-File Search from the Search menu, use \t (backslash-lowercase-t) to represent the Tab character, then select the folders you want to search in (make sure ONLY the files you want to modify are in this folder and make a copy before you try this).

Certainly you could use sed, awk, etc. but TextWrangler is easy and it's a great app to have anyway, I use it for all kinds of things.

chown33
Oct 22, 2009, 11:56 PM
+1 for TextWrangler.

If it's a one-for-one replacement of '|' with '\t', also see the 'tr' command. Sed, awk, perl, etc. are good when you want something more than straight translation. But tr is the command of choice for plain translation.

mysterytramp
Oct 23, 2009, 06:45 AM
I had my doubts about TextWrangler. Though I use it for a lot of text processing, I didn't think it was the best tool for this job. Whew, definitely way wrong ...

First thing I did was try to get its AppleScript dictionary. For some reason, the dictionary wouldn't open, but strangely, it's recordable (do people even make recordable apps any more?). That gave me the basics of a script that zipped through the files, as well as 300 or so more. I expect some Unix voodoo would have been even faster, but the performance was adequate.

mt

ChOas
Oct 23, 2009, 07:13 AM
I might be a bit late, but from the commandline you can just do this (in the dir where your textfiles are):


perl -pi'*.org' -e 's/\|/\t/g' *


This would create a copy of all your files with an .org extension and replace all
the pipes in the files with a tab.

BertyBoy
Oct 23, 2009, 09:58 AM
+1 for the one line Unix solutions. In sed.

for i in *.dat; do sed 's/|/<tab>/g' $i > $i.tsv; done

where <tab> means to use the tab key.

You could do something similar in any number of Unix scripting languages.

OR, you could just import it into Excel, which you've stated you want to do anyway. Or MySQL, Or Oracle, or whatever. All these allow you to define the field separator.

balamw
Oct 23, 2009, 11:24 AM
What's wrong with tr?

tr '|' '\t'

seems to do the trick too. So many ways to skin a cat in unix!

You could wrap that in a for loop like above

for i in *.dat; do tr '|' '\t' < $i > $i.tsv; done

I'm sure there's a way to do it in awk as well since both input and output are delimited, and you could probably use find/xargs instead of for depending on where the files are.

B