Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

jasperx

macrumors newbie
Original poster
Oct 17, 2012
7
0
I need to create several hundred files from a list exported from excel to a csv.
The csv looks like this:
Name1,City1,State1
Name2,City2,State2
continues to
Name600,City600,State600

The desired result is 600 files like this:
Name1.txt which contains Name1,City1,State1
Name2.txt which contains Name2,City2,State2

My coding skills are quite rusty but it seems like this should be pretty simple
 
Try this :

Code:
-- csv file with Unix (LF) line feed
set theFile to choose file with prompt "Choose your csv file"
set theFolder to choose folder with prompt "Choose folder for the txt files"
set theParagraphs to paragraphs of (read theFile)
repeat with aParagraph in theParagraphs
	try
		set TID to AppleScript's text item delimiters
		set AppleScript's text item delimiters to {","}
		set theItems to text items of aParagraph
		set theName to item 1 of theItems
		set AppleScript's text item delimiters to TID
		do shell script "echo " & quoted form of aParagraph & " > " & quoted form of POSIX path of (theFolder as string) & quoted form of theName & ".txt"
	on error
		set AppleScript's text item delimiters to TID
	end try
end repeat
 
Code:
#!/bin/bash

for i in $(cat the.csv); do
  name=$(echo "$i"|awk -F',' '{print $1}')
  echo "doing $name"
  echo "$i" >${i}.txt
done
 
Obligatory Perl implementation:

Code:
#!/usr/bin/perl

use strict;
use warnings;

open CSV, "<", "filename.csv" or die $!;

while (<CSV>){
  my @line = split(',',$_);
  open OUTFILE, '>'.$line[0].'.csv' or die $!;
  print OUTFILE $_;
  close OUTFILE;
}

close CSV;
 
OK had a go with the bash version... and something is not quite right
The code is in a file rosetta which has:
Code:
#!/bin/bash
for i in $(cat rosettaTest.csv); do
  name=$(echo "$i"|awk -F',' '{print $1}')
  echo "doing $name"
  echo "$i" >${i}.txt
done
Here is my output

bash-3.2$ bash rosetta
rosetta: line 1: {rtf1ansiansicpg1252cocoartf1138cocoasubrtf510: command not found
rosetta: line 2: syntax error near unexpected token `}'
rosetta: line 2: `{\fonttbl\f0\fmodern\fcharset0 Courier;}'
bash-3.2$
 
You need to save the bash script as a text file. That looks like RTF. Once that is done, then...

Code:
bash> chmod +x rosetta
bash> ./rosetta
 
You need to save the bash script as a text file. That looks like RTF. Once that is done, then...

Code:
bash> chmod +x rosetta
bash> ./rosetta

Yup, use TextEdit and go to the Format menu (in the menu bar) and choose 'Make plain text'

Then save the file. Or even better, use something like TextWrangler
 
Or go full Command Line and use nano, vi, or emacs.

Also if this is a one-off, you could even copy and paste that script into the command line and it should work.
 
Making it a txt file made a huge difference. The script runs but is not doing as intended. Trying again on what the desired results would be:
A file created for every record or line in a csv
Each line or record in the csv has three values:
Name1,City1,State1

The desired result is to create a file for each record named by the first value in the record. If I have 10 records the result would be 10 files named:
Name1.txt, Name2.txt....Name10.txt
The contents of Name1.txt would be the record or Name1,City1,State1

I ran this on a csv with 10 records. The output was 3 files... not 10.
The first file contained records 1 and 2.
The second file contained records 3,4,5,6
The third file contained records 7,8,9,10
Here is the code
Code:
#!/bin/bash

for i in $(cat the.csv); do
  name=$(echo "$i"|awk -F',' '{print $1}')
  echo "doing $name"
  echo "$i" >${i}.txt
done
 
I ran this on a csv with 10 records. The output was 3 files... not 10.
The first file contained records 1 and 2.
The second file contained records 3,4,5,6
The third file contained records 7,8,9,10

Post your CSV data. Otherwise all anyone can do is guess what input produced your stated output.


There's a bug in the code. Change this:
Code:
  echo "$i" >${i}.txt
to this:
Code:
  echo "$i" >"${name}.txt"


My example "the.csv" file (manually created):
Code:
able,akron,state
bravo,baltimore,state
charlie,city,state
delta,city,state
eeevill,eville,state
foxtrot,tango,whiskey
golf,glasgow,whisky
hotel,homer,state
indigo,indio,state
juliet,juarez,state
 
There's a bug in the code.

I was waiting for someone to catch that. It also doesn't handle spaces. Mac_Max's Obligatory Perl implementation creates csv files instead of txt files. (at least it did when I tested it) When fixed it's very fast. I wish I knew Perl.
 

Attachments

  • Picture 4.png
    Picture 4.png
    73.7 KB · Views: 186
Here's an implementation in Python:

Code:
import sys

def createFiles(csv):
	with open(csv, 'r') as f:
		for line in f:
			s = line.split(',')
			with open(s[0] + '.txt', 'w') as w:
				w.write(s[0] + ',' + s[1] + ',' + s[2])
            	
createFiles(sys.argv[1])

You can use it by running: python createFiles.py <example.csv>
 
I was waiting for someone to catch that. It also doesn't handle spaces. Mac_Max's Obligatory Perl implementation creates csv files instead of txt files. (at least it did when I tested it) When fixed it's very fast. I wish I knew Perl.

Here's the all-awk version, which handles spaces in names:
Code:
awk -F',' '{ print $0 >($1 ".txt") }'  the.csv
And here's the version that spits out each name:
Code:
awk -F',' '{ print "doing " $1 >"/dev/stderr"; print $0 >($1 ".txt") }'  the.csv

So this would be the new contents of "rosetta":
Code:
#!/bin/bash
awk -F',' '{ print $0 >($1 ".txt") }'  the.csv
or replace that awk one-liner with the longer awk one-liner.


Test data, in "the.csv":
Code:
able alfa,akron,state
baker bravo,baltimore,state
Charley Gordon,city,state
delta dawn,city,state
eeevill twin,eville,state
foxtrot,tango,whiskey
golf glove,glasgow,whisky
hotel,homer,ak
Inigo Montoya,indio,state
Juliet is the sun,juarez,state
 
I was waiting for someone to catch that. It also doesn't handle spaces. Mac_Max's Obligatory Perl implementation creates csv files instead of txt files. (at least it did when I tested it) When fixed it's very fast. I wish I knew Perl.

Ah, yes in that case:

Code:
#!/usr/bin/perl

use strict;
use warnings;

die "Please provide a filename\n" if @ARGV == 0;
open CSV, "<", $ARGV[0] or die $!;

while (<CSV>){
  my @line = split(',',$_);
  $line[0] =~ s/ /-/g; #comment this out with if you prefer to keep spaces as spaces.
  open OUTFILE, '>'.$line[0].'.txt' or die $!;
  print OUTFILE $_;
  close OUTFILE;
}

close CSV;

My original version will handle spaces just fine but this version will also replace them with dashes (or whatever character you'd like, remember to escape regex control chars). It also now accepts the input file as an argument.

Perl is a great language. It took me about three weeks to become armed and dangerous with it and about 5 months to become rather good at it. Here's the book I learned from: http://onyxneon.com/books/modern_perl/
 
Last edited:
You guys are having way too much fun... running off and leaving me with my eyes crossed but each time a little closer to the desired result. Stick with me as I think we are getting close:

This code as corrected and expanded on by chown33 :
Code:
#!/bin/bash
for i in $(cat rosettaTest.csv); do
  name=$(echo "$i"|awk -F',' '{ print $0 >($1 ".txt") }'  rosettaTest.csv)
  echo "doing $name"
  echo "$i" >${name}.txt
done

Acting on this rosettaTest.csv which contains:
13_01,1.70452E+11,182797
13_02,1.7045E+11,21676
13_03,1.7045E+11,22022
13_04,1.70452E+11,206985
2_01,1.70452E+11,117549
2_02,1.7045E+11,47029
2_03,1.70452E+11,118304
2_04,1.7045E+11,31579
2_05,1.7045E+11,30966
2_06,1.7045E+11,48157
Output was three (not 10) properly named files:
2_03.txt
13_01.txt
13_03.txt

13_01.txt contained
13_01,1.70452E+11,182797 good this is correct
13_02,1.7045E+11,21676 wrong it should have iterated to create file 13_02.txt

13_03.txt contained
13_03,1.7045E+11,22022
13_04,1.70452E+11,206985
2_01,1.70452E+11,117549
2_02,1.7045E+11,47029

2_03.txt contained
2_03,1.70452E+11,118304
2_04,1.7045E+11,31579
2_05,1.7045E+11,30966
2_06,1.7045E+11,48157

The loops are not working right… it should iterate to 13_02 immediately after writing the first record into 13_01… instead it writes two records, skips to the third record writes 4 records skips to the 7th record.

I got nowhere with Perl as it gave this back
use: command not found
I did use
Code:
#!/usr/bin/perl


----------

Is this the problem?
I ran a file command on rosettaTest.csv
rosettaTest.csv: ASCII text, with CR, LF line terminators

My csv came from Excel 2011... I read somewhere that there was a problem with the way Excel 2008 wrote out line terminators.

I am going off in search of a fix.
 
Code:
bash-3.2$ tr '\15' '\n' < rosettaTest.csv
got me
rosettaTest.csv: ASCII text, with CR, LF line terminators
with same incorrect results
 
Easier than all that

Copy the text into textWrangler.

Use find and replace to get rid of all those commas and spaces.

Use the Prefix/Suffix line...
To add touch with a space at begining of each line.
Then add .txt to end of each line.

touch billyMinneapolisMN.txt
touch AliceDallasTX.txt



Select All and Copy.

Open Terminal.

Us cd to get to your directory of choice.

Paste.

Done.
 
Last edited:
TextWrangler fixed the problem.
Now I get 17 perfect files before getting 500+ copies of
Code:
awk: 2_14.txt makes too many open files
 input record number 18, file rosetta.csv
 source line number 1
doing
Seems like the file needs closing so I tried adding a close file command like this:
Code:
#!/bin/bash
for i in $(cat rosetta.csv); do
  name=$(echo "$i"|awk -F',' '{ print $0 >($1 ".txt") }'  rosetta.csv)
  echo "doing $name"
  echo "$i" >${name}.txt
  close file "${name}.txt"
done
that was just a guess and it didn't work
 
Guessing almost never works.

Try replacing your entire shell script (not just the line that uses awk) with this:
Code:
#!/bin/bash

mkdir -p out

awk -F',' \
 '{ f="\"out/"$1".txt\""; system("echo \"" $0 "\" >" f ); }'  \
 [B][COLOR="Red"]the.csv[/COLOR][/B]
I got tired of always deleting the individual output files mixed in with the CSV and shell script, so it now makes a sub-folder named "out" and creates all the files there. If you don't want that, and can't figure out how to change it, ask again and I'll revise it.

The above is a COMPLETE REPLACEMENT for the original shell script. It is not just a few lines to be merged in with the original.


My test CSV file:
Code:
able alfa,akron,state
baker bravo,baltimore,state
Charley Gordon,city,state
delta dawn,city,state
eeevill twin,eville,state
foxtrot,tango,whiskey
golf glove,glasgow,whisky
hotel,homer,ak
Inigo Montoya,indio,state
Juliet is the sun,juarez,state
13_01,1.70452E+11,182797
13_02,1.7045E+11,21676
13_03,1.7045E+11,22022
13_04,1.70452E+11,206985
2_01,1.70452E+11,117549
2_02,1.7045E+11,47029
2_03,1.70452E+11,118304
2_04,1.7045E+11,31579
2_05,1.7045E+11,30966
2_06,1.7045E+11,48157

EDIT

I have hilited above in red the ONE place where you should change it to rosetta.csv, or whatever the name of your CSV file really is. Do not change anything else.

You somehow made a change to the original shell script. You have this:
Code:
name=$(echo "$i"|awk -F',' '{ print $0 >($1 ".txt") }'  rosetta.csv)
where the original is this:
Code:
name=$(echo "$i"|awk -F',' '{print $1}')
These two lines do very different things.

Accuracy is essential in programming. Seemingly innocuous changes can have unexpectedly large effects.
 
Last edited:
I got nowhere with Perl as it gave this back
use: command not found
I did use
Code:
#!/usr/bin/perl



Strange, Perl is included with OS X. Was the file chmod'ed with +x? Either way, I went ahead and attached a properly setup version. To use it type:

./csvparse filename
 

Attachments

  • csvparse.zip
    417 bytes · Views: 111
It's been a very useful, informative exchange. Thankyou!
The script is working, I got introduced to textWrangler (I used BBedit years ago), got a lesson in careful editing and my hard work is done. I will give the Perl a run in a bit. This has inspired me to pull out my Sed Awk book and do a bit more. Compared to the 50 lines of code required to do this job with VB in Excel sed and awk are amazing.
 
Late to the party I know but I felt like the Cocoa guys were missing out! :)

Code:
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification
{
    NSError*    error = nil;
    NSString*   cvsData = [NSString stringWithContentsOfFile:[@"~/cvsData.txt" stringByExpandingTildeInPath] encoding:NSUTF8StringEncoding error:&error];
    
    for ( NSString* line in [cvsData componentsSeparatedByString:@"\n"] )
    {
        NSArray*    lineParts = [line componentsSeparatedByString:@","];

        NSString*   name = [@"~/output/" stringByAppendingString:lineParts[0]];
        name = [name stringByAppendingPathExtension:@"txt"];
        name = [name stringByExpandingTildeInPath];

        [line writeToFile:name atomically:YES encoding:NSUTF8StringEncoding error:&error];
    }
}

The source data goes in a file named "cvsData.txt" in your home folder. The output is placed into a folder named "output" in your home folder (you'll need to make this).

Enjoy.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.