This is just something that has passed through my Google Reader listing. It's called MAMA (Metadata Analysis and Mining Application) and is from Opera.
http://dev.opera.com/articles/view/mama/
http://dev.opera.com/articles/view/mama/
Be sure to at least read through the Key Findings as it gives some highlights of some of their findings. Opera has some great development pages and good efforts like this one. I'm sure the hand coders here will appreciate this reading. Enjoy.MAMA is a structural Web-page search engine—it trawls Web pages and returns results detailing page structures, including what HTML, CSS, and script is used on it, as well as whether the HTML validates. In this document, and the ones that link from it, you'll find data that has been pulled from MAMA so far. There is a lot of information here, but every effort has been made to keep it readable and interesting for the various types of people who might be interested in such data.