Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

sun surfer

macrumors regular
Original poster
Jun 6, 2010
105
0
Hi, this may sound extremely easy but I can't figure out what to do (and I'm a total non-techy). I have some long text codes saved in multiple textedit files and need to edit multiple items at once. Doing it manually would take a long, long time.

Sometime told me that I can just use a perl regular expression, pop it in and it will automatically delete all the text parts I need deleted. The parts I need deleted all start and end with the same text but have different middles. So they not only said it was possible but said they'd done and it worked great (they were working on the same type of thing I am), and gave me the code to use.

So it all sounds great and I have the code but I have no idea where to put it or what to do with it! Once I know what to do with it, it should just pop out new textedit files with the info I need deleted. I can't get in touch with the person who gave me the perl code now, and I can't figure out an answer on google, so I'm asking here.

Where do I go to put a perl regular expression so that it will alter a textedit file? Thanks for any help.
 

jhiesey

macrumors newbie
Mar 6, 2013
17
16
You will need to run perl from the terminal (Terminal.app), but exactly how you do it depends on what your friend gave you.

Could you post the code you were given?
 

sun surfer

macrumors regular
Original poster
Jun 6, 2010
105
0
Hi, thanks; I had no idea I'd need to use the terminal.

Here's the basic code with "x" substituting for the beginning and end of the data it's looking for to erase (the beginning and end of the targeted text are always the same; the middle is always different hence needing this code to do the job).

Code:
<x(.+\n)*.+<\/x>

They said this code worked perfectly for them in the same situation (although to a non-tech person like me it looks like hieroglyphics). But they said just substitute in for "x" and plug it in. I've rarely used the terminal but I know what it is, but how will this code know to target a particular file or particular group of files?
 

jhiesey

macrumors newbie
Mar 6, 2013
17
16
Well, you don't necessarily need to use the terminal. In fact, since your regular expression tries to match across multiple lines (there can be returns/enters in the middle part), I wouldn't really recommend it, since it's a bit harder to do that way.

Many text editors have built-in support for regular expressions, but TextEdit doesn't. If you don't want to use the terminal, you can download an editor like Sublime Text 2, which nominally costs $70, but has a trial you can use for free forever if you ignore the prompts that nag you once in a while (it's not really that annoying actually). Other text editors like TextMate and BBEdit would work as well, but aren't free either. If you aren't familiar with the terminal, this is a much easier approach.

This regular expression you gave looks a bit more specific than one that just finds a string with constant beginning and end and a varying middle. It looks like it was intended to match opening and closing tags in HTML or XML specifically. For example, if x is span, it will match
Code:
<span>hi</span>
or
Code:
<span style="color:blue">some text</span>
It will even match where the middle part goes across multiple lines, as long as none of the lines are blank.

If you just want to match something more general, without the angle brackets (<>) and such, this isn't the right regular expression.

If you use Sublime Text 2, you can just open the folder containing the files you want to edit (it lets you open whole folders), open global find and replace (command-shift-f), turn on regular expression mode (there's a button on the far left for that, labeled .*), put your expression into the "Find" box, and leave the "Replace" box empty. If you click "Replace", all of the files will be changed, and you can inspect each file before saving, or you can just go to File->Save All to save everything. Other text editors will be similar.

If you still want to use the the terminal instead of another text editor, I can give you instructions for that too, but it might be a little tricky if you are totally unfamiliar with it.
 

sun surfer

macrumors regular
Original poster
Jun 6, 2010
105
0
Thanks! I downloaded Sublime Text 2 as you suggested, and am trying to make it work. This expression is intended to match opening and closing tags, and that's what encloses each group of text I need to remove - the text inside is different but the tags are the same, so that's why this person made this expression to make it easier.

I've done exactly as you said (and thanks for the clear, precise directions that made it very easy) but something goes wrong - it deletes everything from the start of the first tag to the end of the last tag, so basically it deletes almost everything. What I need to happen is for it to delete each text inside the tags, but leave the text in between unaffected. Is it maybe something I'm still not doing right, or is the expression flawed? The person said the expression worked perfectly for them.


ETA - I think I have solved the problem. I thought to try a different text editor and tried "Ultraedit". I wouldn't have been able to figure out what to do there if not for your directions for Sublime Text 2, but with fumbling around it wasn't so different to figure it out in the "find and replace" window, and it worked perfectly! So, somehow this regular expression doesn't work properly in sublime text 2 but will work properly in ultraedit. Either way, it's solved now and thanks so much for your help. I wouldn't have been able to do it without your help. :)
 
Last edited:

jhiesey

macrumors newbie
Mar 6, 2013
17
16
That behavior is indeed what the expression you posted will do. It doesn't make any distinction between what's inside the opening tag and what's between the tags. For example, if you have
Code:
This is <span style="color:blue">in blue</span>!
then after you do the replacement you will end up with
Code:
This is !

Instead, it sounds like you want the result to be
Code:
This is in blue!
To do that, you would need to fill in the "Find:" box with
Code:
<span[^>]*>((.+\n)*.+)<\/span>
and the "Replace:" box with
Code:
\1
(that's a backslash and the number one). This last part specifies that you want to replace it with what is inside, which is the part matched inside the outer set of parentheses.

Let me know if that does what you want.

----------

Maybe I didn't quite understand what you are trying to do, since I really don't see why it does what you want in Ultraedit.

It is true that regular expressions aren't particularly well standardized, however, so differing behavior isn't much of a surprise.
 

sun surfer

macrumors regular
Original poster
Jun 6, 2010
105
0
No, it's a bit different. Here's an example:

Code:
<x>A
B
C</x>
D
E
F
<x>G
H
I</x>
J
K
L
<x>M
N
O</x>

In this case, I would want D, E, F, J, K, L to not be deleted, but with Sublime Text 2 it does delete those as well as the text in the tags, because it deletes everything from the first start tag to the last end tag, even text not in tags, as long as the text is after the first tag and before the last tag.

However, the perl regular expression does work properly in Ultraedit, so it's all good now and I just used that once I tried it and realised it works there. I'm not sure why the perl regular expression doesn't work the same way in Sublime Text 2?
 

NeverhadaPC

macrumors 6502
Oct 3, 2008
410
2
Surprised no one recommended TextWrangler. It's free and supports regular expressions.

I've used it many times to do what you seek. Just go to Find/Replace and check the "Grep" box.
 

Persifleur

macrumors member
Jun 1, 2005
66
0
London, UK
By default, the '+' pattern is "greedy", meaning it'll match as many characters as possible while still matching the overall pattern. Sublime is doing what I would consider standard, and UltraEdit's implementation I would consider non-standard. (Insofar as regular expressions can be standard.)

From UltraEdit's non-greedy tutorial:

By default, Perl regular expressions are "greedy", meaning they will match as much data as possible before a new line. Even if the conditions of the regular expression have been met, but a line break has not yet occurred, the regular expression will continue searching for data that satisfies the search criteria.
(emphasis mine)

UltraEdit checks after every line break whether it's found a match, and if so, it stops. Sublime is doing what I would consider "standard": continuing to check whether there is a match until it gets to the end of the file.

It just so happens you want the non-greedy behaviour. Thus the solution is just to make the pattern "non-greedy" by putting a ? after the +:
Code:
<x(.+\n)*.+[B]?[/B]<\/x>
You should then get the same behaviour in both applications.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.