Re: Re: Just Found the Coolest Thing...
Originally posted by mrjamin
ah hah. One of those things that i expect will be useful once, maybe twice, but rarely gets used. Still, if it gets used once then it still has a use so nice find.
Atchelly, back at the ISP+webhost I used to work for, we used something similar for Wintendo (!$#%& Windoid shop...) called WebZip quite regularly.
This came in quite handy when a prospective hosting client didn't know quite how much space their site occupied (we charged by 5MB blocks of space back in the day) and/or didn't wanna give us their FTP login so we could find out that way and give them a hosting quote, or after they signed up to host with us if they didn't remember their FTP login or didn't have control of their site (lotsa times their old host or developer would be either unresponsive or holding a site hostage), we could just post our WebZip capture of their site without needing access to their old server at all.
Dubbed an "offline browser" -- the idea being, you'd have a local copy to browse offline at your leisure (if your dialup access were metered or spotty) -- WebZip (and similar proggies) would start at a given URL, save the page's HTML and follow its image ref's to save included images (also linked JS and CSS files), then follow all the page's hyperlinks to find and save further pages and their images, all within the same site (base URL) until no more links remained to be followed.
The only major drawbacks, at least for our uses, were that capturing dynamically-generated sites could turn into a real mess, often completely unusuable, and it was impossible to capture any "orphaned" content which was not directly linked from or ref'd within public pages (e.g., stray files, "secret" sub-sites at non-linked URL paths). If you can't get to a page, image, or other file by starting at the site's homepage and clicking links around the site manually, then WebZip (and its ilk) simply have no way of knowing any such unlinked files even exist to be captured.