Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

kingthong

macrumors member
Original poster
Sep 20, 2010
62
0
Somewhere but not here.
Hi,

i have a query. there's a site, lets say abcd.com

abcd.com has 500 articles that i'd like to read. the links to all the articles are one 1 page. Now 500 is too big a number for me to read on the comp and i'd like to print out all 500 pages. Is there any way for me to get *only* the text from site and put it into 1 word doc or pdf.

i tried using Scrapbook addon for firefox. it rips the site perfectly and saves it all into html files. i can then convert them to .doc files BUT i get the entire webpage in the doc file. i.e with the banners, frames, menus etc.

So basically, how do i get only the text into the doc file so that i can print it.

any help is appreciated. Thanks!
 

Ttownbeast

macrumors 65816
May 10, 2009
1,135
1
Hi,

i have a query. there's a site, lets say abcd.com

abcd.com has 500 articles that i'd like to read. the links to all the articles are one 1 page. Now 500 is too big a number for me to read on the comp and i'd like to print out all 500 pages. Is there any way for me to get *only* the text from site and put it into 1 word doc or pdf.

i tried using Scrapbook addon for firefox. it rips the site perfectly and saves it all into html files. i can then convert them to .doc files BUT i get the entire webpage in the doc file. i.e with the banners, frames, menus etc.

So basically, how do i get only the text into the doc file so that i can print it.

any help is appreciated. Thanks!

Highlight only the text in the web pages you want by going to the beginning of each article clicking on behind the first letter of the story scroll to the bottom then hold the shift key and click again after the last period in the article then hit copy and paste it to whatever text editor you use. But be mindful of copyright issues concerning how much you are allowed to copy sometimes news and media sites have very specific rules about reproduction of copyrighted work and permissions may be specific to what amount you can and cannot duplicate.
 

kingthong

macrumors member
Original poster
Sep 20, 2010
62
0
Somewhere but not here.
Hey Ttownbeast.
Thanks for the reply. the thing is there are too many articles for me to use the ctrl-c ctrl-v method. Hence i wanted to know if there was a quicker method of doing it.
And thanks for the warning. this site just has a bunch of articles i'd like to print for personal use. does that count as illegal?
 

Melrose

Suspended
Dec 12, 2007
7,806
399
No, it's not illegal unless whatever terms and conditions of use explicitly forbid printing out a copy. Even if that byzantine restriction was present they'd have no way to enforce it.

This. It really depends on what you do with it. Any coding, images, image effects, etc etc, are copyrighted of the designer or the company that hires them. Downloading them is harmless unless you start ripping them off. :)

Although, for subscription items, it would be illegal unless otherwise stated.
 

Ttownbeast

macrumors 65816
May 10, 2009
1,135
1
Hey Ttownbeast.
Thanks for the reply. the thing is there are too many articles for me to use the ctrl-c ctrl-v method. Hence i wanted to know if there was a quicker method of doing it.
And thanks for the warning. this site just has a bunch of articles i'd like to print for personal use. does that count as illegal?

Probably not illegal for personal use depending on the sites rules but be careful even if you cannot get caught there is still the copyright.

For example when I was a student teacher I had to learn about specific copyright issues when photocopying texts from the public library concerning fair use for the purposes of creating lesson plans.

It is reasonable in most instances when copying text for the classroom to copy only the sections of the book you need but never appropriate to copy the entire book. This is the line concerning fair use for instructional purposes vs. what would basically be reprinting the entire book--not easily enforceable but certainly against copyright and commonly frowned upon within the field.
 

kingthong

macrumors member
Original poster
Sep 20, 2010
62
0
Somewhere but not here.
Thanks. that makes sense. copying an entire textbook would possibly amount to piracy.

but we digressed a little. does anyone have any suggestions regarding the site ripping? :)
 

Lyle

macrumors 68000
Jun 11, 2003
1,874
1
Madison, Alabama
If it were me I'd probably write a script (a shell script, or more likely something in Ruby) that used wget to download the web site's pages, and then "somehow" pipe those pages through the Readability bookmarklet's JavaScript code.

But yeah, that's going to be an interesting problem to solve.
 

kingthong

macrumors member
Original poster
Sep 20, 2010
62
0
Somewhere but not here.
If it were me I'd probably write a script (a shell script, or more likely something in Ruby) that used wget to download the web site's pages, and then "somehow" pipe those pages through the Readability bookmarklet's JavaScript code.

hmmm that is possible. i don't know any shell scripting language though. maybe this would be a useful program for someone to make. although defining what part of the webpage is relevant might be a difficult job.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.