Web page (text only?) searchable archive

Discussion in 'Web Design and Development' started by chesterville, Nov 1, 2008.

  1. chesterville macrumors newbie

    Joined:
    Sep 26, 2006
    #1
    Does anyone know if there is a program (or a way to develop a program other than a cumbersome Access database) to easily archive and search web pages - specifically the text? An RSS feed has been suggested, but will this allow me to archive and search the text for keywords?

    My question probably isn't all that clear, so please allow me to elaborate: Basically, I'm a finance geek. I read 20+ finance blogs on a regular basis and several newspapers (online editions). I frequently wish I could type in a keyword for information that I want (say, "foreclosures" or "US GDP") and then be able to see that text of blog posts or newspaper articles on the subject that I found interesting. This is particularly tricky when I'm looking for specific information from a blog post or newspaper, but I don't remember which one. Also, my dream (twisted, I know) is to eventually build up an archive spanning years of information so that I can refer back to it in the future (something about not knowing history dooms you to repeat it). So if the database actually stored the text on my computer or on some "cloud computer" where it is not subject to the future existence of the blog that would be nice too (I'm not looking to break copyright laws, though - it would be strictly for personal use). If you made it to this part of the thread, wow! Thanks for reading all of this.

    Thanks in advance for any assistance.

    Cheers,

    Andrew
     
  2. chilipie macrumors 6502a

    chilipie

    Joined:
    May 8, 2006
    Location:
    Englandshire
    #2
    Most RSS feed readers will archive the data on your computer, so you won't be relying on the sites to still be there. I think using feeds would be the most sensible way to do it, unless you want to manually copy and paste the relevant text from every new item.
     
  3. Nugget macrumors 65816

    Nugget

    Joined:
    Nov 24, 2002
    Location:
    Houston Texas USA
    #3
    You might want to take a look at browseback which is sort of designed to do what you describe (although it stores as pdf so that you retain formatting and image information as well).

    It's really slick, but can be a bit resource intensive.
     
  4. angelwatt Moderator emeritus

    angelwatt

    Joined:
    Aug 16, 2005
    Location:
    USA
    #4
  5. chesterville thread starter macrumors newbie

    Joined:
    Sep 26, 2006
    #5
    Thanks everyone for the responses. I'll be looking into all of your suggestions.
     

Share This Page