Just Found the Coolest Thing...

Discussion in 'Mac Apps and Mac App Store' started by tjwett, Aug 6, 2003.

  1. tjwett macrumors 68000

    tjwett

    Joined:
    May 6, 2002
    Location:
    Brooklyn, NYC
    #1
    I posted in General asking about a way to download and entire website and all it's contents and I just came across this and I thought it deserved it's own thread. Behold...SiteSucker.

    http://www.sitesucker.us

    p.s. maybe no one else will think it's as cool as i do. i cannot guarantee that you will agree with me on it's level of kewlness. enjoy.
     
  2. mrjamin macrumors 65816

    mrjamin

    Joined:
    Feb 6, 2003
    Location:
    Strongbadia
    #2
    Re: Just Found the Coolest Thing...

    ah hah. One of those things that i expect will be useful once, maybe twice, but rarely gets used. Still, if it gets used once then it still has a use so nice find.
     
  3. MoparShaha macrumors 68000

    MoparShaha

    Joined:
    May 15, 2003
    Location:
    San Francisco
    #3
    This thing is awesome! Just today I wanted to download a website, but didn't want to manually do it. This thing saved me hours! Thanks for the great info tjwett!
     
  4. Mudbug Administrator emeritus

    Mudbug

    Joined:
    Jun 28, 2002
    Location:
    North Central Colorado
    #4
    seems like this would be helpful on those rare odd occasions of premature specification that you want to make a mirror, and want to do it quickly
     
  5. tjwett thread starter macrumors 68000

    tjwett

    Joined:
    May 6, 2002
    Location:
    Brooklyn, NYC
    #5
    ...or snoop around huge corporate sites for buried treasure.;)
     
  6. MrMacMan macrumors 604

    MrMacMan

    Joined:
    Jul 4, 2001
    Location:
    1 Block away from NYC.
    #6
    Does anyone know if this can be used on directories

    Like a folder with tons of images, could I get that not the whole site... (which would take a year) ?

    sound cool.
     
  7. sparkleytone macrumors 68020

    sparkleytone

    Joined:
    Oct 28, 2001
    Location:
    Greensboro, NC
  8. benixau macrumors 65816

    benixau

    Joined:
    Oct 9, 2002
    Location:
    Sydney, Australia
    #8
    wget -r is sooo much easier if your a unix like person who knows what there doing and hasnt already f*cked their system by developing in panther, going back to jag for SimCity and then going to panther again - i get crashes so often (progs only - OS is still good) that it makes me wonder what the hell i did.
     
  9. tjwett thread starter macrumors 68000

    tjwett

    Joined:
    May 6, 2002
    Location:
    Brooklyn, NYC
    #9
    you can specify specific directories, and in preferences you can choose to only download certain file types and ignore others. so yeah, you could just download every image from a site and nothing else.
     
  10. tjwett thread starter macrumors 68000

    tjwett

    Joined:
    May 6, 2002
    Location:
    Brooklyn, NYC
    #10
    forgive me since i know just about zero unix right now but i'm entering this command into the terminal and i'm getting a "command not found" return. what am i missing? thanks.
     
  11. mrjamin macrumors 65816

    mrjamin

    Joined:
    Feb 6, 2003
    Location:
    Strongbadia
    #11
    developer tools. have a look in your applications folder - there should be a subfolder called developer tools. there's a .pkg in there that'll sort it out,
     
  12. idea_hamster macrumors 65816

    idea_hamster

    Joined:
    Jul 11, 2003
    Location:
    NYC, or thereabouts
    #12
    <OT>

    tjwett --

    Nice 'tar, and I just realized that it's even more enjoyable when you've posted twice in a row: it adds a odd synchronicity to what's already a pretty dada animation. With a few of these do-not-enter chickens in a row, it's brings a bit of Warhol to it.

    </OT>
     
  13. frescies macrumors regular

    Joined:
    Dec 9, 2002
    Location:
    Los Angeles, CA
    #13
    um....

    wget -r is much easier???

    I mean... I love using the terminal and all, but I just used this "SiteSucker" and its pretty easy. I'm so glad tjwett mentioned this because I have been putting off archiving the website here at work for some updates that require erasing stuff. Just archived our website and it rocked!!!!!
     
  14. DeadlyBreakfast macrumors regular

    Joined:
    Aug 26, 2002
    Location:
    In a dark corner somewhere. Help me..
    #14
    Wow...That kicked butt.....Thanks for the tip!!
     
  15. sparkleytone macrumors 68020

    sparkleytone

    Joined:
    Oct 28, 2001
    Location:
    Greensboro, NC
    #15
    http://wget.sunsite.dk/

    wget is much easier for anyone who feels somewhat comfortable in the terminal. it is also chock full of options if you want to delve in. for example, you can download only files of a certain type. many many many more things are available. it is also unbeatable when it comes to lost/dropped connections and unstable environments.
     
  16. daveL macrumors 68020

    daveL

    Joined:
    Jun 18, 2003
    Location:
    Montana
    #16
    Well I have the Dev tools installed on Jag, and I'll be damned if I can find wget. Maybe I missed a pkg somewhere along the line. I seem to remember the dev tools pkg under /Applications, when I first got my machine last November. Since then I've done a clean 10.2.6 install, plus the latest Dev Tools from ADC, and I don't find wget.
     
  17. edesignuk Moderator emeritus

    edesignuk

    Joined:
    Mar 25, 2002
    Location:
    London, England
    #17
    That's a great little app, I just used it to 'suck' my own site. It worked perfectly and very quickly.
     
  18. daveL macrumors 68020

    daveL

    Joined:
    Jun 18, 2003
    Location:
    Montana
    #18
    I actually use 'wget' under linux to capture mp3 radio streams. Since it's command line, I can run it from 'cron' and capture a particular radio show on the proper time and day of the week. I works great.
     
  19. tjwett thread starter macrumors 68000

    tjwett

    Joined:
    May 6, 2002
    Location:
    Brooklyn, NYC
    #19
    same here. i have all the Dev Tools installed, i even use them from time to time for AppleScript Studio. can't find anything on wget. oh well. anyway, i gotta wget the hell outta here.
    sorry, couldn't resist.
     
  20. crenz macrumors 6502a

    crenz

    Joined:
    Jul 3, 2003
    Location:
    Shanghai, China
    #20
    Might be worth mentioning that

    wget -r -np http://server/path/file

    will refrain from getting stuff in the parent directories as well. Otherwise you might end up getting much more than you wanted.
     
  21. edesignuk Moderator emeritus

    edesignuk

    Joined:
    Mar 25, 2002
    Location:
    London, England
    #21
    The reason neither of you can find it is because as sparkleytone has already indicated, you need to get it from here first :rolleyes:
     
  22. tjwett thread starter macrumors 68000

    tjwett

    Joined:
    May 6, 2002
    Location:
    Brooklyn, NYC
    #22
    ahhh, now i see. thanks a lot.
     
  23. kaizer macrumors member

    Joined:
    Jan 23, 2003
    Location:
    Malaysia
    #23
    IScooper will only suck picture files if thats what you're looking for. Then there's PageSucker to vacuum the whole site. It's a shareware though... On a non registered version, you can only go three page deep. Unlimited once you paid up.

    P.s: PageSucker doesn't really do well on those PHP,ASP,CF site. If it's plain ol' html, then it's a different story.
     
  24. idea_hamster macrumors 65816

    idea_hamster

    Joined:
    Jul 11, 2003
    Location:
    NYC, or thereabouts
    #24
    Hey, wow! I just sucked my site down for practice -- that's a gem!
     
  25. SubGothius macrumors member

    Joined:
    Apr 29, 2003
    Location:
    Tucson, AZ
    #25
    Re: Re: Just Found the Coolest Thing...

    Atchelly, back at the ISP+webhost I used to work for, we used something similar for Wintendo (!$#%& Windoid shop...) called WebZip quite regularly.

    This came in quite handy when a prospective hosting client didn't know quite how much space their site occupied (we charged by 5MB blocks of space back in the day) and/or didn't wanna give us their FTP login so we could find out that way and give them a hosting quote, or after they signed up to host with us if they didn't remember their FTP login or didn't have control of their site (lotsa times their old host or developer would be either unresponsive or holding a site hostage), we could just post our WebZip capture of their site without needing access to their old server at all.

    Dubbed an "offline browser" -- the idea being, you'd have a local copy to browse offline at your leisure (if your dialup access were metered or spotty) -- WebZip (and similar proggies) would start at a given URL, save the page's HTML and follow its image ref's to save included images (also linked JS and CSS files), then follow all the page's hyperlinks to find and save further pages and their images, all within the same site (base URL) until no more links remained to be followed.

    The only major drawbacks, at least for our uses, were that capturing dynamically-generated sites could turn into a real mess, often completely unusuable, and it was impossible to capture any "orphaned" content which was not directly linked from or ref'd within public pages (e.g., stray files, "secret" sub-sites at non-linked URL paths). If you can't get to a page, image, or other file by starting at the site's homepage and clicking links around the site manually, then WebZip (and its ilk) simply have no way of knowing any such unlinked files even exist to be captured.
     

Share This Page