Just Found the Coolest Thing...

tjwett

macrumors 68000
I posted in General asking about a way to download and entire website and all it's contents and I just came across this and I thought it deserved it's own thread. Behold...SiteSucker.

http://www.sitesucker.us

p.s. maybe no one else will think it's as cool as i do. i cannot guarantee that you will agree with me on it's level of kewlness. enjoy.
 
Re: Just Found the Coolest Thing...

Originally posted by tjwett
I posted in General asking about a way to download and entire website and all it's contents and I just came across this and I thought it deserved it's own thread. Behold...SiteSucker.

http://www.sitesucker.us

p.s. maybe no one else will think it's as cool as i do. i cannot guarantee that you will agree with me on it's level of kewlness. enjoy.

ah hah. One of those things that i expect will be useful once, maybe twice, but rarely gets used. Still, if it gets used once then it still has a use so nice find.
 
This thing is awesome! Just today I wanted to download a website, but didn't want to manually do it. This thing saved me hours! Thanks for the great info tjwett!
 
seems like this would be helpful on those rare odd occasions of premature specification that you want to make a mirror, and want to do it quickly
 
Originally posted by Mudbug
seems like this would be helpful on those rare odd occasions of premature specification that you want to make a mirror, and want to do it quickly

...or snoop around huge corporate sites for buried treasure.;)
 
Originally posted by tjwett
...or snoop around huge corporate sites for buried treasure.;)

Does anyone know if this can be used on directories

Like a folder with tons of images, could I get that not the whole site... (which would take a year) ?

sound cool.
 
wget -r is sooo much easier if your a unix like person who knows what there doing and hasnt already f*cked their system by developing in panther, going back to jag for SimCity and then going to panther again - i get crashes so often (progs only - OS is still good) that it makes me wonder what the hell i did.
 
Originally posted by MrMacman
Does anyone know if this can be used on directories

Like a folder with tons of images, could I get that not the whole site... (which would take a year) ?

sound cool.

you can specify specific directories, and in preferences you can choose to only download certain file types and ignore others. so yeah, you could just download every image from a site and nothing else.
 
Originally posted by sparkleytone
wget -r

much easier.

forgive me since i know just about zero unix right now but i'm entering this command into the terminal and i'm getting a "command not found" return. what am i missing? thanks.
 
Originally posted by tjwett
forgive me since i know just about zero unix right now but i'm entering this command into the terminal and i'm getting a "command not found" return. what am i missing? thanks.

developer tools. have a look in your applications folder - there should be a subfolder called developer tools. there's a .pkg in there that'll sort it out,
 
<OT>

tjwett --

Nice 'tar, and I just realized that it's even more enjoyable when you've posted twice in a row: it adds a odd synchronicity to what's already a pretty dada animation. With a few of these do-not-enter chickens in a row, it's brings a bit of Warhol to it.

</OT>
 
um....

wget -r is much easier???

I mean... I love using the terminal and all, but I just used this "SiteSucker" and its pretty easy. I'm so glad tjwett mentioned this because I have been putting off archiving the website here at work for some updates that require erasing stuff. Just archived our website and it rocked!!!!!
 
http://wget.sunsite.dk/

wget is much easier for anyone who feels somewhat comfortable in the terminal. it is also chock full of options if you want to delve in. for example, you can download only files of a certain type. many many many more things are available. it is also unbeatable when it comes to lost/dropped connections and unstable environments.
 
Well I have the Dev tools installed on Jag, and I'll be damned if I can find wget. Maybe I missed a pkg somewhere along the line. I seem to remember the dev tools pkg under /Applications, when I first got my machine last November. Since then I've done a clean 10.2.6 install, plus the latest Dev Tools from ADC, and I don't find wget.
 
That's a great little app, I just used it to 'suck' my own site. It worked perfectly and very quickly.
 
I actually use 'wget' under linux to capture mp3 radio streams. Since it's command line, I can run it from 'cron' and capture a particular radio show on the proper time and day of the week. I works great.
 
Originally posted by daveL
Well I have the Dev tools installed on Jag, and I'll be damned if I can find wget. Maybe I missed a pkg somewhere along the line. I seem to remember the dev tools pkg under /Applications, when I first got my machine last November. Since then I've done a clean 10.2.6 install, plus the latest Dev Tools from ADC, and I don't find wget.

same here. i have all the Dev Tools installed, i even use them from time to time for AppleScript Studio. can't find anything on wget. oh well. anyway, i gotta wget the hell outta here.
sorry, couldn't resist.
 
Originally posted by benixau
wget -r is sooo much easier if your a unix like person

Might be worth mentioning that

wget -r -np http://server/path/file

will refrain from getting stuff in the parent directories as well. Otherwise you might end up getting much more than you wanted.
 
Originally posted by daveL
Well I have the Dev tools installed on Jag, and I'll be damned if I can find wget. Maybe I missed a pkg somewhere along the line. I seem to remember the dev tools pkg under /Applications, when I first got my machine last November. Since then I've done a clean 10.2.6 install, plus the latest Dev Tools from ADC, and I don't find wget.
Originally posted by tjwett
same here. i have all the Dev Tools installed, i even use them from time to time for AppleScript Studio. can't find anything on wget. oh well. anyway, i gotta wget the hell outta here.
sorry, couldn't resist.
The reason neither of you can find it is because as sparkleytone has already indicated, you need to get it from here first :rolleyes:
 
Originally posted by edesignuk
The reason neither of you can find it is because as sparkleytone has already indicated, you need to get it from here first :rolleyes:

ahhh, now i see. thanks a lot.
 
IScooper will only suck picture files if thats what you're looking for. Then there's PageSucker to vacuum the whole site. It's a shareware though... On a non registered version, you can only go three page deep. Unlimited once you paid up.

P.s: PageSucker doesn't really do well on those PHP,ASP,CF site. If it's plain ol' html, then it's a different story.
 
Re: Re: Just Found the Coolest Thing...

Originally posted by mrjamin
ah hah. One of those things that i expect will be useful once, maybe twice, but rarely gets used. Still, if it gets used once then it still has a use so nice find.
Atchelly, back at the ISP+webhost I used to work for, we used something similar for Wintendo (!$#%& Windoid shop...) called WebZip quite regularly.

This came in quite handy when a prospective hosting client didn't know quite how much space their site occupied (we charged by 5MB blocks of space back in the day) and/or didn't wanna give us their FTP login so we could find out that way and give them a hosting quote, or after they signed up to host with us if they didn't remember their FTP login or didn't have control of their site (lotsa times their old host or developer would be either unresponsive or holding a site hostage), we could just post our WebZip capture of their site without needing access to their old server at all.

Dubbed an "offline browser" -- the idea being, you'd have a local copy to browse offline at your leisure (if your dialup access were metered or spotty) -- WebZip (and similar proggies) would start at a given URL, save the page's HTML and follow its image ref's to save included images (also linked JS and CSS files), then follow all the page's hyperlinks to find and save further pages and their images, all within the same site (base URL) until no more links remained to be followed.

The only major drawbacks, at least for our uses, were that capturing dynamically-generated sites could turn into a real mess, often completely unusuable, and it was impossible to capture any "orphaned" content which was not directly linked from or ref'd within public pages (e.g., stray files, "secret" sub-sites at non-linked URL paths). If you can't get to a page, image, or other file by starting at the site's homepage and clicking links around the site manually, then WebZip (and its ilk) simply have no way of knowing any such unlinked files even exist to be captured.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.
Back
Top