macOS Create files from list in CSV

jasperx · Oct 17, 2012

I need to create several hundred files from a list exported from excel to a csv.
The csv looks like this:
Name1,City1,State1
Name2,City2,State2
continues to
Name600,City600,State600

The desired result is 600 files like this:
Name1.txt which contains Name1,City1,State1
Name2.txt which contains Name2,City2,State2

My coding skills are quite rusty but it seems like this should be pretty simple

robvas · Oct 17, 2012

Do you want to do it bash, Perl, Ruby, ?

kryten2 · Oct 17, 2012

Try this :

Code:

-- csv file with Unix (LF) line feed
set theFile to choose file with prompt "Choose your csv file"
set theFolder to choose folder with prompt "Choose folder for the txt files"
set theParagraphs to paragraphs of (read theFile)
repeat with aParagraph in theParagraphs
	try
		set TID to AppleScript's text item delimiters
		set AppleScript's text item delimiters to {","}
		set theItems to text items of aParagraph
		set theName to item 1 of theItems
		set AppleScript's text item delimiters to TID
		do shell script "echo " & quoted form of aParagraph & " > " & quoted form of POSIX path of (theFolder as string) & quoted form of theName & ".txt"
	on error
		set AppleScript's text item delimiters to TID
	end try
end repeat

sero · Oct 17, 2012

Code:

#!/bin/bash

for i in $(cat the.csv); do
  name=$(echo "$i"|awk -F',' '{print $1}')
  echo "doing $name"
  echo "$i" >${i}.txt
done

Mac_Max · Oct 17, 2012

Obligatory Perl implementation:

Code:

#!/usr/bin/perl

use strict;
use warnings;

open CSV, "<", "filename.csv" or die $!;

while (<CSV>){
  my @line = split(',',$_);
  open OUTFILE, '>'.$line[0].'.csv' or die $!;
  print OUTFILE $_;
  close OUTFILE;
}

close CSV;

jasperx · Oct 17, 2012

OK had a go with the bash version... and something is not quite right
The code is in a file rosetta which has:

Code:

#!/bin/bash
for i in $(cat rosettaTest.csv); do
  name=$(echo "$i"|awk -F',' '{print $1}')
  echo "doing $name"
  echo "$i" >${i}.txt
done

Here is my output

bash-3.2$ bash rosetta
rosetta: line 1: {rtf1ansiansicpg1252cocoartf1138cocoasubrtf510: command not found
rosetta: line 2: syntax error near unexpected token `}'
rosetta: line 2: `{\fonttbl\f0\fmodern\fcharset0 Courier;}'
bash-3.2$

mfram · Oct 17, 2012

You need to save the bash script as a text file. That looks like RTF. Once that is done, then...

Code:

bash> chmod +x rosetta
bash> ./rosetta

robvas · Oct 17, 2012

mfram said:
You need to save the bash script as a text file. That looks like RTF. Once that is done, then...

Code:

bash> chmod +x rosetta bash> ./rosetta

Yup, use TextEdit and go to the Format menu (in the menu bar) and choose 'Make plain text'

Then save the file. Or even better, use something like TextWrangler

Mac_Max · Oct 17, 2012

Or go full Command Line and use nano, vi, or emacs.

Also if this is a one-off, you could even copy and paste that script into the command line and it should work.

jasperx · Oct 17, 2012

Making it a txt file made a huge difference. The script runs but is not doing as intended. Trying again on what the desired results would be:
A file created for every record or line in a csv
Each line or record in the csv has three values:
Name1,City1,State1

The desired result is to create a file for each record named by the first value in the record. If I have 10 records the result would be 10 files named:
Name1.txt, Name2.txt....Name10.txt
The contents of Name1.txt would be the record or Name1,City1,State1

I ran this on a csv with 10 records. The output was 3 files... not 10.
The first file contained records 1 and 2.
The second file contained records 3,4,5,6
The third file contained records 7,8,9,10
Here is the code

Code:

#!/bin/bash

for i in $(cat the.csv); do
  name=$(echo "$i"|awk -F',' '{print $1}')
  echo "doing $name"
  echo "$i" >${i}.txt
done

chown33 · Oct 17, 2012

jasperx said:
I ran this on a csv with 10 records. The output was 3 files... not 10.
The first file contained records 1 and 2.
The second file contained records 3,4,5,6
The third file contained records 7,8,9,10

Post your CSV data. Otherwise all anyone can do is guess what input produced your stated output.

There's a bug in the code. Change this:

Code:

  echo "$i" >${i}.txt

to this:

Code:

  echo "$i" >"${name}.txt"

My example "the.csv" file (manually created):

Code:

able,akron,state
bravo,baltimore,state
charlie,city,state
delta,city,state
eeevill,eville,state
foxtrot,tango,whiskey
golf,glasgow,whisky
hotel,homer,state
indigo,indio,state
juliet,juarez,state

kryten2 · Oct 17, 2012

There's a bug in the code.

I was waiting for someone to catch that. It also doesn't handle spaces. Mac_Max's Obligatory Perl implementation creates csv files instead of txt files. (at least it did when I tested it) When fixed it's very fast. I wish I knew Perl.

samwich · Oct 17, 2012

Here's an implementation in Python:

Code:

import sys

def createFiles(csv):
	with open(csv, 'r') as f:
		for line in f:
			s = line.split(',')
			with open(s[0] + '.txt', 'w') as w:
				w.write(s[0] + ',' + s[1] + ',' + s[2])
            	
createFiles(sys.argv[1])

You can use it by running: python createFiles.py <example.csv>

chown33 · Oct 17, 2012

kryten2 said:
I was waiting for someone to catch that. It also doesn't handle spaces. Mac_Max's Obligatory Perl implementation creates csv files instead of txt files. (at least it did when I tested it) When fixed it's very fast. I wish I knew Perl.

Here's the all-awk version, which handles spaces in names:

Code:

awk -F',' '{ print $0 >($1 ".txt") }'  the.csv

And here's the version that spits out each name:

Code:

awk -F',' '{ print "doing " $1 >"/dev/stderr"; print $0 >($1 ".txt") }'  the.csv

So this would be the new contents of "rosetta":

Code:

#!/bin/bash
awk -F',' '{ print $0 >($1 ".txt") }'  the.csv

or replace that awk one-liner with the longer awk one-liner.

Test data, in "the.csv":

Code:

able alfa,akron,state
baker bravo,baltimore,state
Charley Gordon,city,state
delta dawn,city,state
eeevill twin,eville,state
foxtrot,tango,whiskey
golf glove,glasgow,whisky
hotel,homer,ak
Inigo Montoya,indio,state
Juliet is the sun,juarez,state

Mac_Max · Oct 18, 2012

kryten2 said:
I was waiting for someone to catch that. It also doesn't handle spaces. Mac_Max's Obligatory Perl implementation creates csv files instead of txt files. (at least it did when I tested it) When fixed it's very fast. I wish I knew Perl.

Ah, yes in that case:

Code:

#!/usr/bin/perl

use strict;
use warnings;

die "Please provide a filename\n" if @ARGV == 0;
open CSV, "<", $ARGV[0] or die $!;

while (<CSV>){
  my @line = split(',',$_);
  $line[0] =~ s/ /-/g; #comment this out with if you prefer to keep spaces as spaces.
  open OUTFILE, '>'.$line[0].'.txt' or die $!;
  print OUTFILE $_;
  close OUTFILE;
}

close CSV;

My original version will handle spaces just fine but this version will also replace them with dashes (or whatever character you'd like, remember to escape regex control chars). It also now accepts the input file as an argument.

Perl is a great language. It took me about three weeks to become armed and dangerous with it and about 5 months to become rather good at it. Here's the book I learned from: http://onyxneon.com/books/modern_perl/

jasperx · Oct 18, 2012

You guys are having way too much fun... running off and leaving me with my eyes crossed but each time a little closer to the desired result. Stick with me as I think we are getting close:

This code as corrected and expanded on by chown33 :

Code:

#!/bin/bash
for i in $(cat rosettaTest.csv); do
  name=$(echo "$i"|awk -F',' '{ print $0 >($1 ".txt") }'  rosettaTest.csv)
  echo "doing $name"
  echo "$i" >${name}.txt
done

Acting on this rosettaTest.csv which contains:
13_01,1.70452E+11,182797
13_02,1.7045E+11,21676
13_03,1.7045E+11,22022
13_04,1.70452E+11,206985
2_01,1.70452E+11,117549
2_02,1.7045E+11,47029
2_03,1.70452E+11,118304
2_04,1.7045E+11,31579
2_05,1.7045E+11,30966
2_06,1.7045E+11,48157
Output was three (not 10) properly named files:
2_03.txt
13_01.txt
13_03.txt

13_01.txt contained
13_01,1.70452E+11,182797 good this is correct
13_02,1.7045E+11,21676 wrong it should have iterated to create file 13_02.txt

13_03.txt contained
13_03,1.7045E+11,22022
13_04,1.70452E+11,206985
2_01,1.70452E+11,117549
2_02,1.7045E+11,47029

2_03.txt contained
2_03,1.70452E+11,118304
2_04,1.7045E+11,31579
2_05,1.7045E+11,30966
2_06,1.7045E+11,48157

The loops are not working right it should iterate to 13_02 immediately after writing the first record into 13_01 instead it writes two records, skips to the third record writes 4 records skips to the 7th record.

I got nowhere with Perl as it gave this back
use: command not found
I did use

Code:

#!/usr/bin/perl

----------

Is this the problem?
I ran a file command on rosettaTest.csv
rosettaTest.csv: ASCII text, with CR, LF line terminators

My csv came from Excel 2011... I read somewhere that there was a problem with the way Excel 2008 wrote out line terminators.

I am going off in search of a fix.

jasperx · Oct 18, 2012

Code:

bash-3.2$ tr '\15' '\n' < rosettaTest.csv

got me
rosettaTest.csv: ASCII text, with CR, LF line terminators
with same incorrect results

Freez · Oct 18, 2012

Easier than all that

Copy the text into textWrangler.

Use find and replace to get rid of all those commas and spaces.

Use the Prefix/Suffix line...
To add touch with a space at begining of each line.
Then add .txt to end of each line.

touch billyMinneapolisMN.txt
touch AliceDallasTX.txt

Select All and Copy.

Open Terminal.

Us cd to get to your directory of choice.

Paste.

Done.

jasperx · Oct 18, 2012

TextWrangler fixed the problem.
Now I get 17 perfect files before getting 500+ copies of

Code:

awk: 2_14.txt makes too many open files
 input record number 18, file rosetta.csv
 source line number 1
doing

Seems like the file needs closing so I tried adding a close file command like this:

Code:

#!/bin/bash
for i in $(cat rosetta.csv); do
  name=$(echo "$i"|awk -F',' '{ print $0 >($1 ".txt") }'  rosetta.csv)
  echo "doing $name"
  echo "$i" >${name}.txt
  close file "${name}.txt"
done

that was just a guess and it didn't work

chown33 · Oct 18, 2012

Guessing almost never works.

Try replacing your entire shell script (not just the line that uses awk) with this:

Code:

#!/bin/bash

mkdir -p out

awk -F',' \
 '{ f="\"out/"$1".txt\""; system("echo \"" $0 "\" >" f ); }'  \
 [B][COLOR="Red"]the.csv[/COLOR][/B]

I got tired of always deleting the individual output files mixed in with the CSV and shell script, so it now makes a sub-folder named "out" and creates all the files there. If you don't want that, and can't figure out how to change it, ask again and I'll revise it.

The above is a COMPLETE REPLACEMENT for the original shell script. It is not just a few lines to be merged in with the original.

My test CSV file:

Code:

able alfa,akron,state
baker bravo,baltimore,state
Charley Gordon,city,state
delta dawn,city,state
eeevill twin,eville,state
foxtrot,tango,whiskey
golf glove,glasgow,whisky
hotel,homer,ak
Inigo Montoya,indio,state
Juliet is the sun,juarez,state
13_01,1.70452E+11,182797
13_02,1.7045E+11,21676
13_03,1.7045E+11,22022
13_04,1.70452E+11,206985
2_01,1.70452E+11,117549
2_02,1.7045E+11,47029
2_03,1.70452E+11,118304
2_04,1.7045E+11,31579
2_05,1.7045E+11,30966
2_06,1.7045E+11,48157

EDIT

I have hilited above in red the ONE place where you should change it to rosetta.csv, or whatever the name of your CSV file really is. Do not change anything else.

You somehow made a change to the original shell script. You have this:

Code:

name=$(echo "$i"|awk -F',' '{ print $0 >($1 ".txt") }'  rosetta.csv)

where the original is this:

Code:

name=$(echo "$i"|awk -F',' '{print $1}')

These two lines do very different things.

Accuracy is essential in programming. Seemingly innocuous changes can have unexpectedly large effects.

Mac_Max · Oct 18, 2012

jasperx said:
I got nowhere with Perl as it gave this back
use: command not found
I did use

Code:

#!/usr/bin/perl

Strange, Perl is included with OS X. Was the file chmod'ed with +x? Either way, I went ahead and attached a properly setup version. To use it type:

./csvparse filename

jasperx · Oct 18, 2012

It's been a very useful, informative exchange. Thankyou!
The script is working, I got introduced to textWrangler (I used BBedit years ago), got a lesson in careful editing and my hard work is done. I will give the Perl a run in a bit. This has inspired me to pull out my Sed Awk book and do a bit more. Compared to the 50 lines of code required to do this job with VB in Excel sed and awk are amazing.

badNameErr · Oct 19, 2012

Late to the party I know but I felt like the Cocoa guys were missing out!

Code:

- (void)applicationDidFinishLaunching:(NSNotification *)aNotification
{
    NSError*    error = nil;
    NSString*   cvsData = [NSString stringWithContentsOfFile:[@"~/cvsData.txt" stringByExpandingTildeInPath] encoding:NSUTF8StringEncoding error:&error];
    
    for ( NSString* line in [cvsData componentsSeparatedByString:@"\n"] )
    {
        NSArray*    lineParts = [line componentsSeparatedByString:@","];

        NSString*   name = [@"~/output/" stringByAppendingString:lineParts[0]];
        name = [name stringByAppendingPathExtension:@"txt"];
        name = [name stringByExpandingTildeInPath];

        [line writeToFile:name atomically:YES encoding:NSUTF8StringEncoding error:&error];
    }
}

The source data goes in a file named "cvsData.txt" in your home folder. The output is placed into a folder named "output" in your home folder (you'll need to make this).

Enjoy.

macOS Create files from list in CSV

macrumors newbie

macrumors 68040

macrumors 65816

macrumors member

macrumors 6502

macrumors newbie

Contributor

macrumors 68040

macrumors 6502

macrumors newbie

Moderator

macrumors 65816

Attachments

macrumors regular

Moderator

macrumors 6502

macrumors newbie

macrumors newbie

macrumors newbie

macrumors newbie

Moderator

macrumors 6502

Attachments

macrumors newbie

macrumors regular

Our Staff