macOS Line merging in large text file

wrldwzrd89 · Nov 11, 2011

So, I've got a large text file with over 6,000 lines in it.

I've managed to do the first part of what I want, which is prefix any line that does NOT start with a tab character with a semicolon followed by a space, with some clever regular expressions.

Now what I'd like to do is this... any line that has been prefixed in the previous step should be merged with the line before it, by deleting the newline character separating them. My Google-fu is failing me on this matter, though.

jiminaus · Nov 11, 2011

This works. It may be overly complicated because I don't know sed well.

Code:

sed 's/^\([^	]\)/; \1/' | \
sed -n '
/^; / !{
	x
	/^$/ n
	s/\n//g
	p
}
/^; / H
'

Note that there's a literal tab character between [^ and ] on the first line.

Of course, you didn't provide any sample file so I can't actually be sure it'll work for you.

wrldwzrd89 · Nov 11, 2011

jiminaus said:
This works. It may be overly complicated because I don't know sed well.

Code:

sed 's/^$[^ ]$/; \1/' | \ sed -n ' /^; / !{ x /^$/ n s/\n//g p } /^; / H '

Note that there's a literal tab character between [^ and ] on the first line.

Of course, you didn't provide any sample file so I can't actually be sure it'll work for you.

I try pasting the first command in that into Terminal, and it beeps at me :S
This is the output I get:

Code:

sed 's/^\([^]\)/; \1/' | \ < /Users/wrldwzrd89/Desktop/raw_armory.txt > /Users/wrldwzrd89/Desktop/raw_armory2.txt
sed: 1: "s/^\([^]\)/; \1/": unbalanced brackets ([])
-bash:  : command not found

Also attached an example of the type of file I'm dealing with.

chown33 · Nov 11, 2011

File: semi-merge.awk

Code:

# awk program

# p holds one previous line, assembles merged lines.
BEGIN { p = "" }

# for each line NOT starting with semicolon.
# If p holds anything, print it, then store line in p.
$0 !~ /^;/  { if ( length( p ) > 0 ) print p;  p = $0; }

# for each line starting with semicolon.
# Append it to p.
$0 ~ /^;/  { p = p $0; }

# ensures last line stored in p is printed.
END  { print p }

Command line:

Code:

awk -f semi-merge.awk raw_armory_sample.txt >out.txt

wrldwzrd89 · Nov 11, 2011

chown33 said:

File: semi-merge.awk

Code:

# awk program

# p holds one previous line, assembles merged lines.
BEGIN { p = "" }

# for each line NOT starting with semicolon.
# If p holds anything, print it, then store line in p.
$0 !~ /^;/  { if ( length( p ) > 0 ) print p;  p = $0; }

# for each line starting with semicolon.
# Append it to p.
$0 ~ /^;/  { p = p $0; }

# ensures last line stored in p is printed.
END  { print p }

Command line:

Code:

awk -f semi-merge.awk raw_armory_sample.txt >out.txt

Success! This worked.

dmi · Nov 11, 2011

perl -i.bak -lpe 'BEGIN{$/="\n; ";$\="; "}' raw_armory_sample.txt

dmi · Nov 11, 2011

awk 'BEGIN{RS="\n; ";ORS="; "' raw_armory_sample > out.txt

chown33 · Nov 12, 2011

dmi said:
awk 'BEGIN{RS="\n; ";ORS="; "' raw_armory_sample > out.txt

Did you test that? It's not even syntactically correct: missing }.

dmi · Nov 12, 2011

chown33 said:
Did you test that?

I did test it, but when I tested it, it looked like
awk 'BEGIN{RS="\n; ";ORS="; "}1' raw_armory_sample.txt
I must have deleted some characters when I added the
> out.txt
which part I admit I did not test, but thought ought to work equivalently to an earlier example

chown33 · Nov 13, 2011

dmi said:
Code:

awk 'BEGIN{RS="\n; ";ORS="; "}1' raw_armory_sample.txt

That script now does something, but the output isn't correct. There are no newlines in the ouput:

Code:

wc raw*.txt
[B]      66    1054    5917 raw_armory_sample.txt[/B]

awk 'BEGIN{RS="\n; ";ORS="; "}1' raw_armory_sample.txt | wc
[B]       0    1093    5985[/B]

Your perl script works, so maybe leave it at that.

Search

Search

macOS Line merging in large text file

wrldwzrd89

macrumors G5

jiminaus

macrumors 65816

wrldwzrd89

macrumors G5

Attachments

chown33

Moderator

wrldwzrd89

macrumors G5

dmi

macrumors regular

dmi

macrumors regular

chown33

Moderator

dmi

macrumors regular

chown33

Moderator

Our Staff