Discussion in 'Mac Basics and Help' started by Andrew936, Mar 6, 2011.

  Andrew936

    Mar 6, 2011
    Hello MacRumors,
    I have a large number (several hundred to several thousand) of downloaded html pages with images and links that I need to convert to plain text files (no images, links, or formatting at all). I've been using both the Automator and a batch file-rename application to change the file extensions to .txt, but the end product still isn't quite what I want.

    What I come up with is a .txt file that still has formatting and links etc. in it. When I look at the file info, it says it's a "plain text document", but the "make plain text" action in Textedit (format > make plain text) is still available. When I do that ("make plain text in textedit"), I get exactly what I want: a bare-bones, text only document. I just need to do that a few thousand more times. How can I do that?

    Is this an issue of text encoding? I should also say that my end product also has to be UTF-8, according to the documentation for another application that these text files are ultimately going to be put into.

    So in short, I need a way to perform the textedit action "Make plain text" en masse. I think there's a way to do it with the terminal, but I am fairly clueless as to how bash/unix commands work — so please hold my hand with any instructions involving that sort of thing. I'm running Tiger on a 2007 Macbook pro (I'm not sure if you needed to know that, but I figured it couldn't hurt).

    Thank you so much for any help!
  Quotenfrau


    Mar 6, 2011
    You can install "lynx" throught MacPorts (see for more info)

    here a generic example

    $ lynx -dump "" > plaintext.txt
    This also works with local HTML files and can be automated with only one line shell script.

    for example "cd" in directory with HTML files

            for i in $( ls -1 *.html ); do
                lynx -dump $PWD/$i > plaintext_$i.txt
    This was the UNIX/Open Source way.
  Andrew936

    Mar 6, 2011
    Thank you for your response.

    Unfortunately, I'm having some trouble with Lynx and macport. I'm pretty bad at figuring out opensource software, and I keep on getting "bus errors" in the terminal when I try to run/install lynx. I also have to be honest, and say that I don't understand what lynx does, in any case. Can't you just use the terminal to give commands?

    So, is it possible to give a more detailed/dumbed-down walkthrough? or maybe does someone know about another application that can automate the "make plain text" operation?

    Thank you again for any help!
  Andrew936

    Mar 6, 2011
    I figure I'd give this one last bump, anyone have any idea?

    Any help would be truly appreciated! Either a new suggestion, or help getting lynx to work/explaining what the placeholders in the above code stand for. This is my message when I start up lynx, in any case:

    Last login: Thu Mar 10 00:14:36 on ttyp1
    /Applications/Lynx/lynx; exit
    Welcome to Darwin!
    ip-90-142:~ [username]$ /Applications/Lynx/lynx; exit
    Bus error
    [Process completed]

