help needed with site ripping

Discussion in 'Community Discussion' started by kingthong, Oct 13, 2010.

  1. kingthong macrumors member

    Joined:
    Sep 20, 2010
    Location:
    Somewhere but not here.
    #1
    Hi,

    i have a query. there's a site, lets say abcd.com

    abcd.com has 500 articles that i'd like to read. the links to all the articles are one 1 page. Now 500 is too big a number for me to read on the comp and i'd like to print out all 500 pages. Is there any way for me to get *only* the text from site and put it into 1 word doc or pdf.

    i tried using Scrapbook addon for firefox. it rips the site perfectly and saves it all into html files. i can then convert them to .doc files BUT i get the entire webpage in the doc file. i.e with the banners, frames, menus etc.

    So basically, how do i get only the text into the doc file so that i can print it.

    any help is appreciated. Thanks!
     
  2. Ttownbeast macrumors 65816

    Joined:
    May 10, 2009
    #2
    Highlight only the text in the web pages you want by going to the beginning of each article clicking on behind the first letter of the story scroll to the bottom then hold the shift key and click again after the last period in the article then hit copy and paste it to whatever text editor you use. But be mindful of copyright issues concerning how much you are allowed to copy sometimes news and media sites have very specific rules about reproduction of copyrighted work and permissions may be specific to what amount you can and cannot duplicate.
     
  3. kingthong thread starter macrumors member

    Joined:
    Sep 20, 2010
    Location:
    Somewhere but not here.
    #3
    Hey Ttownbeast.
    Thanks for the reply. the thing is there are too many articles for me to use the ctrl-c ctrl-v method. Hence i wanted to know if there was a quicker method of doing it.
    And thanks for the warning. this site just has a bunch of articles i'd like to print for personal use. does that count as illegal?
     
  4. miles01110 macrumors Core

    miles01110

    Joined:
    Jul 24, 2006
    Location:
    The Ivory Tower (I'm not coming down)
    #4
    No, it's not illegal unless whatever terms and conditions of use explicitly forbid printing out a copy. Even if that byzantine restriction was present they'd have no way to enforce it.
     
  5. Melrose Suspended

    Melrose

    Joined:
    Dec 12, 2007
    #5
    This. It really depends on what you do with it. Any coding, images, image effects, etc etc, are copyrighted of the designer or the company that hires them. Downloading them is harmless unless you start ripping them off. :)

    Although, for subscription items, it would be illegal unless otherwise stated.
     
  6. Ttownbeast macrumors 65816

    Joined:
    May 10, 2009
    #6
    Probably not illegal for personal use depending on the sites rules but be careful even if you cannot get caught there is still the copyright.

    For example when I was a student teacher I had to learn about specific copyright issues when photocopying texts from the public library concerning fair use for the purposes of creating lesson plans.

    It is reasonable in most instances when copying text for the classroom to copy only the sections of the book you need but never appropriate to copy the entire book. This is the line concerning fair use for instructional purposes vs. what would basically be reprinting the entire book--not easily enforceable but certainly against copyright and commonly frowned upon within the field.
     
  7. kingthong thread starter macrumors member

    Joined:
    Sep 20, 2010
    Location:
    Somewhere but not here.
    #7
    Thanks. that makes sense. copying an entire textbook would possibly amount to piracy.

    but we digressed a little. does anyone have any suggestions regarding the site ripping? :)
     
  8. Lyle macrumors 68000

    Lyle

    Joined:
    Jun 11, 2003
    Location:
    Madison, Alabama
    #8
    If it were me I'd probably write a script (a shell script, or more likely something in Ruby) that used wget to download the web site's pages, and then "somehow" pipe those pages through the Readability bookmarklet's JavaScript code.

    But yeah, that's going to be an interesting problem to solve.
     
  9. kingthong thread starter macrumors member

    Joined:
    Sep 20, 2010
    Location:
    Somewhere but not here.
    #9
    hmmm that is possible. i don't know any shell scripting language though. maybe this would be a useful program for someone to make. although defining what part of the webpage is relevant might be a difficult job.
     

Share This Page