Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

mysterytramp

macrumors 65816
Original poster
Jul 17, 2008
1,334
4
Maryland
This should be an easy project but one that's eaten up way too much time.

I have about 50 database files (text), between 750 and 850K, where I need to replace the pipe character ("|") for a tab so the files can be easily imported into Excel or any other spreadsheet.

I was willing to let Automator do the job, even if it would take loads longer, but Automator looks really deficient in the text manipulation department.

It probably looks like a job for sed or perl, but the quickie tutorials I could find left out how to handle things like the tab character. So I'm doing the weasel thing in asking for help for something I should be able to do myself.

Can you help a guy out?

thanks

mt
 

HiRez

macrumors 603
Jan 6, 2004
6,250
2,576
Western US
I would just use the free TextWrangler, which has a multifile search and replace function. Using TextWrangler 3.0 (the menus are different in previous versions), select Multi-File Search from the Search menu, use \t (backslash-lowercase-t) to represent the Tab character, then select the folders you want to search in (make sure ONLY the files you want to modify are in this folder and make a copy before you try this).

Certainly you could use sed, awk, etc. but TextWrangler is easy and it's a great app to have anyway, I use it for all kinds of things.
 

chown33

Moderator
Staff member
Aug 9, 2009
10,751
8,423
A sea of green
+1 for TextWrangler.

If it's a one-for-one replacement of '|' with '\t', also see the 'tr' command. Sed, awk, perl, etc. are good when you want something more than straight translation. But tr is the command of choice for plain translation.
 

mysterytramp

macrumors 65816
Original poster
Jul 17, 2008
1,334
4
Maryland
I had my doubts about TextWrangler. Though I use it for a lot of text processing, I didn't think it was the best tool for this job. Whew, definitely way wrong ...

First thing I did was try to get its AppleScript dictionary. For some reason, the dictionary wouldn't open, but strangely, it's recordable (do people even make recordable apps any more?). That gave me the basics of a script that zipped through the files, as well as 300 or so more. I expect some Unix voodoo would have been even faster, but the performance was adequate.

mt
 

ChOas

macrumors regular
Nov 24, 2006
139
0
The Netherlands
I might be a bit late, but from the commandline you can just do this (in the dir where your textfiles are):

Code:
perl -pi'*.org' -e 's/\|/\t/g' *

This would create a copy of all your files with an .org extension and replace all
the pipes in the files with a tab.
 

BertyBoy

macrumors 6502
Feb 1, 2009
326
0
+1 for the one line Unix solutions. In sed.

for i in *.dat; do sed 's/|/<tab>/g' $i > $i.tsv; done

where <tab> means to use the tab key.

You could do something similar in any number of Unix scripting languages.

OR, you could just import it into Excel, which you've stated you want to do anyway. Or MySQL, Or Oracle, or whatever. All these allow you to define the field separator.
 

balamw

Moderator emeritus
Aug 16, 2005
19,366
979
New England
What's wrong with tr?

Code:
tr '|' '\t'

seems to do the trick too. So many ways to skin a cat in unix!

You could wrap that in a for loop like above

Code:
for i in *.dat; do tr '|' '\t' < $i > $i.tsv; done

I'm sure there's a way to do it in awk as well since both input and output are delimited, and you could probably use find/xargs instead of for depending on where the files are.

B
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.