PDA

View Full Version : Sed replace whitespace




Ti_Poussin
Aug 28, 2008, 09:36 PM
I'm having some problem with sed on os x.5. I would like to remove leading white space from a file.

I try those two sed command:
sed -e 's/^[:space:]+//' file1.txt
sed -e 's/^[ \t]+//' file1.txt

With and without the -e and -E. Nothing seem to do it. What am I missing here?!? Is the os x sed standard?



lee1210
Aug 28, 2008, 10:00 PM
+ for one or more of a pattern wasn't working in my copy, i'm not a sed guy so i'm not sure if that's standard or not. Just double it up:
sed -e 's/^[ \t][ \t]*//' file.txt
or, really:
sed -e 's/^[ \t]*//' file.txt

Since... what's the worst that happens? It replaces the beginning of the line with nothing?

-Lee

HiRez
Aug 28, 2008, 10:06 PM
Try:

sed 's/^[ \t]*//'

EDIT: Rats...beat to it :P

Ti_Poussin
Aug 28, 2008, 10:09 PM
no it's look like the old sed version that come with os x doesn't know what a tab is and doesn't match it. The only solution so far is install a real sed with Fink and solve the problem for me, but I need to distribute this to other user in my park, not cool at all.

Really, Apple for unix tools implementation: they really s**k big time. I'm really getting tired of always having a weird install path to support, tools that behave oddly, old crap or have special implementation.

The best solution will be a good linux in vmware or what?!

right now I try to escape the tab char with something like that \'$'\t'', sadly not much success so far.

Ti_Poussin
Aug 28, 2008, 10:10 PM
replace the + with * doesn't work either, any way remove 0 occurrence of space with nothing won't do much.

lee1210
Aug 28, 2008, 10:19 PM
no it's look like the old sed version that come with os x doesn't know what a tab is and doesn't match it. The only solution so far is install a real sed with Fink and solve the problem for me, but I need to distribute this to other user in my park, not cool at all.

Really, Apple for unix tools implementation: they really s**k big time. I'm really getting tired of always having a weird install path to support, tools that behave oddly, old crap or have special implementation.

The best solution will be a good linux in vmware or what?!

right now I try to escape the tab char with something like that \'$'\t'', sadly not much success so far.

You're using a different OS X than I'm using, then. OS X is a UNIX, not linux. I suppose that could be viewed as a problem but it generally just takes a little adjusting to BSD style switches to commands rather than the GNU/linux versions.

I certainly didn't make up the regex without verifying it's functionality. If you need GNU versions of tools you can build them or use fink, but in this particular case I can't imagine why this wouldn't be working on your system.

edit: I was wrong about the tab. I could clean this up with yet another edit, but that would be dishonest. There was a simple solution to this and from the docs this does seem to be standard behavior.

-Lee

Edit: Also, * means 0 or more occurrences of, not just 0 occurrences. I also verified that the regex works with typing the tab literally using ctrl+v, then typing tab.

Edit: ... Ooops. the file i was using i had only indented with spaces. \t does not work. Apologies for the mistake. Using a literal tab character does work without issue. From this page:
http://www.cims.nyu.edu/cgi-systems/info2html?(sed)Regular%2520Expressions


`\CHAR'
Matches CHAR, where CHAR is one of `$', `*', `.', `[', `\', or `^'.
Note that the only C-like backslash sequences that you can
portably assume to be interpreted are `\n' and `\\'; in particular
`\t' is not portable, and matches a `t' under most implementations
of `sed', rather than a tab character.


So literal tab it is, no \t

RidgeRacerType4
Aug 29, 2008, 05:02 AM
Correct using \t is not part of all sets of regular expressions. (I believe '\t' and '+' are used in perl's extended version of regular expressions)

when in doubt I always refer to this place: http://www.grymoire.com/Unix/Regular.html

They got a convenient chart at the bottom of the page.

They also got sed and awk tutorials as well: http://www.grymoire.com/Unix/

got me through systems programming :D

Ti_Poussin
Aug 30, 2008, 12:14 AM
yeah the literally tab should work (note the \x09 doesn't work either), haven't test it through. I found another solution in the between:

expand -t4 oldFile > newfile
sed ...

The fact that bug me off the most is that it work in recent version of sed, Apple just throw us a old 2005 version of sed. The GNU version does indeed support it. The Recent BSD version too. Many tool that come with os x are really outdated, I had to update many of them for compatibility with other at work. I wish they release some update for those tools from time to time.

smirk
Sep 13, 2008, 07:12 PM
I don't know if this will help, and I'm not even 100% that I have this right, because I'm just learning regular expressions today, but escaping the + sign in a regular expression on OS X 10.5.4 seems to make it mean "1 or more occurrences". Without the escape, it seems to take it literally as a plus sign.

In other words, to get a list of all lines with a number in it, I had to specify

grep '[0-9]\+' testfile

instead of

grep '[0-9]+' testfile

which is backwards from how I understood it was supposed to work.

Sayer
Sep 13, 2008, 10:52 PM
Saying OS X is non-standard for UNIX and Linux is magically compatible and such is nonsense.

Look at the configure/make files for any distro and you see just as much "hand holding" for non-standard paths for OSes other than Mac OS X (like, say, linux).

And installing via Fink is not a "standard path" uhm ok. In this computer I have "special" paths for Fink, MacPorts and Mac OS X's UNIX layer (Darwin). Its a big freaking mess trying to be "open" no matter what OS you use.