Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

malfromcessnock

macrumors member
Original poster
Mar 20, 2012
46
0
Hello board members

I have struck a problem not necessarily only relevant to Pages.

I downloaded a story from the net that is *.txt. I want to put it on my eBook either as *.txt or *.ePub it doesn't really matter.

But the story site formats in a really stupid manner. At the end of every line there is a line break (the reverse P with a double upstroke). And at the end of every paragraph there are two of these line breaks. Page seems to call these "invisibles" whereas in Word I would have called them "enters" or maybe even "carriage returns".

Now when you try to add it to your eReader it formats using the width of the eReader screen so many of these forced line breaks cause havoc and you find you have half lines of sometimes only one of two words with the rest of the text on the next line.

I wonder if there is a way to reformat essentially removing all these unnecessary "invisibles"?

I successfully did it using find and replace in a minute or two, but then you still need to go through the 900 page story to add spaces to separate words that have melded together due to the removal of the "invisibles". There are occurrences of these three or four times a paragraph. I just did 60 pages in 2 hours - grrrr

Anyone know of a tool that can do this?

Cheers, Mal
 
Common issue.
  1. Download the free TextWrangler.
  2. Drag & drop your .txt file to TextWrangler.
  3. Select the Search/Find... menu item.
  4. In the Find box, type "\r\r" without the quotes. In the Replace box, type "\t\t" without the quotes.
  5. With the Wrap Around checkbox checked, press the [Replace All] button.
  6. In the Find box, type "\r" without the quotes. Within the Replace box, type a [space]. That is, press the space bar once.
  7. Again with the Wrap Around checkbox checked, press the [Replace All] button.
  8. In the Find box, type "\t\t" without the quotes. In the Replace box, type "\r\r" without the quotes.
  9. For the last time with the Wrap Around checkbox checked, press the [Replace All] button.
 
Try MisterMe's approach first. If that doesn't work, you can try this:

You can do the following in any text editor that allows you to enter a character for "carriage return"

One such text editor is "Tex-Edit Plus", available here:
http://www.tex-edit.com/

PRINT OUT THESE INSTRUCTIONS AND CHECK OFF AS YOU GO ALONG

So, try this:
1. Download Tex-Edit Plus. Very handy to have around.

2. Open the file in question in TE+. (Note: since you have already "hand-modified" part of the file, it may be necessary to "split" the file into modified and as-yet-unmodified sections. Just create 2 separate files using "save as". You can re-join them later.

3. Open the as-yet-UNmodified file segment using TE+.

4. Press Command-F to bring up the "find/change" dialog.

5. You will note that TE+ offers you the option to insert invisible characters into the F/C dialog. The character we are talking about is "CR" (carriage return)

6. Click ONE TIME in the upper portion of F/C to place the cursor there, then click the "CR" button TWO TIMES. You should see "^c^c" in the box.

7. Now go to the lower portion, and type "&&&" into it (3 "ampersands").

8. Next, click "Replace All".

9. All occurrences of two CR's should be replaced by "&&&". This SHOULD NOT change occurrences of only one CR.

10. Now, summon up Find Change again. This time, put the cursor in the top portion, and click the "CR" button ONE TIME.

11. Click to select the lower portion, and "wipe it clean". There should be nothing in it at all.

12. Now click "Replace all". This should delete all occurrences of "single CR's" throughout the document. Note: it's going to look funny because all you'll see is a stream of text with "&&&" here and there.

13. Now, for the last F/R. Summon the F/R dialog once more. Click in the top portion to select it, then type "&&&" again (3 ampersands).

14. Move to the lower portion with the cursor and clear the box. Next, you want to hit the "CR" button TWO TIMES, so that you see "^c^c" in the box.

15. Click "Replace all". All occurrences of "&&&" will be replaced by two CR's, redefining your paragraphs and adding a line between them. All "linefeed CR's" will be gone, permitting the text to flow freely within each paragraph.

16. The last step is to "re-join" the portion of the file you just re-worked with TE+ with the portion you previously "modified by hand".

There may be other utilities that can do this, but this works for me when needed. Hope it works out for you!
 
Thank you MisterMe and Fishrrman

MisterMe and Fishrrman - thank you so much.

I haven't had time to even read your help responses (1:20am) but I'll do it first thing.

Given the extensive nature of both of your text responses I think I have a lot of reading and help, now at my fingertips.

Thanks again - did you find my message easy enough to understand? Sometimes I find it difficult to put my thought into words.

All the best - will post a better response tomorrow.

Mal
 
Success! An eBook fit for a critical reader

Thank you board members.

Success after three or four minutes on a 900 page text file which is now well prepared for life on an eReader.

Thank you for your expertise - much appreciated.

Cheers, Mal
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.