Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

wrldwzrd89

macrumors G5
Original poster
Jun 6, 2003
12,110
77
Solon, OH
So, I've got a large text file with over 6,000 lines in it.

I've managed to do the first part of what I want, which is prefix any line that does NOT start with a tab character with a semicolon followed by a space, with some clever regular expressions.

Now what I'd like to do is this... any line that has been prefixed in the previous step should be merged with the line before it, by deleting the newline character separating them. My Google-fu is failing me on this matter, though.
 
This works. It may be overly complicated because I don't know sed well.

Code:
sed 's/^\([^	]\)/; \1/' | \
sed -n '
/^; / !{
	x
	/^$/ n
	s/\n//g
	p
}
/^; / H
'

Note that there's a literal tab character between [^ and ] on the first line.

Of course, you didn't provide any sample file so I can't actually be sure it'll work for you. :p
 
Last edited:
This works. It may be overly complicated because I don't know sed well.

Code:
sed 's/^\([^	]\)/; \1/' | \
sed -n '
/^; / !{
	x
	/^$/ n
	s/\n//g
	p
}
/^; / H
'

Note that there's a literal tab character between [^ and ] on the first line.

Of course, you didn't provide any sample file so I can't actually be sure it'll work for you. :p
I try pasting the first command in that into Terminal, and it beeps at me :S
This is the output I get:
Code:
sed 's/^\([^]\)/; \1/' | \ < /Users/wrldwzrd89/Desktop/raw_armory.txt > /Users/wrldwzrd89/Desktop/raw_armory2.txt
sed: 1: "s/^\([^]\)/; \1/": unbalanced brackets ([])
-bash:  : command not found
Also attached an example of the type of file I'm dealing with.
 

Attachments

  • raw_armory_sample.txt
    5.8 KB · Views: 141
File: semi-merge.awk
Code:
# awk program

# p holds one previous line, assembles merged lines.
BEGIN { p = "" }

# for each line NOT starting with semicolon.
# If p holds anything, print it, then store line in p.
$0 !~ /^;/  { if ( length( p ) > 0 ) print p;  p = $0; }

# for each line starting with semicolon.
# Append it to p.
$0 ~ /^;/  { p = p $0; }

# ensures last line stored in p is printed.
END  { print p }

Command line:
Code:
awk -f semi-merge.awk raw_armory_sample.txt >out.txt
 
File: semi-merge.awk
Code:
# awk program

# p holds one previous line, assembles merged lines.
BEGIN { p = "" }

# for each line NOT starting with semicolon.
# If p holds anything, print it, then store line in p.
$0 !~ /^;/  { if ( length( p ) > 0 ) print p;  p = $0; }

# for each line starting with semicolon.
# Append it to p.
$0 ~ /^;/  { p = p $0; }

# ensures last line stored in p is printed.
END  { print p }

Command line:
Code:
awk -f semi-merge.awk raw_armory_sample.txt >out.txt
Success! This worked. :D
 
perl -i.bak -lpe 'BEGIN{$/="\n; ";$\="; "}' raw_armory_sample.txt
 
Did you test that?
I did test it, but when I tested it, it looked like
awk 'BEGIN{RS="\n; ";ORS="; "}1' raw_armory_sample.txt
I must have deleted some characters when I added the
> out.txt
which part I admit I did not test, but thought ought to work equivalently to an earlier example
 
Last edited:
Code:
awk 'BEGIN{RS="\n; ";ORS="; "}1' raw_armory_sample.txt
That script now does something, but the output isn't correct. There are no newlines in the ouput:
Code:
wc raw*.txt
[B]      66    1054    5917 raw_armory_sample.txt[/B]

awk 'BEGIN{RS="\n; ";ORS="; "}1' raw_armory_sample.txt | wc
[B]       0    1093    5985[/B]
Your perl script works, so maybe leave it at that.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.