PDA

View Full Version : Data Collection




NewbieNerd
Mar 26, 2006, 10:38 PM
I'm thinking about setting up some machine learning algorithms, particularly to use with nba and college basketball prediction. I did a couple of them last year for a class with the men's NCAA tournament and it worked quite well, so I'm interested in keeping it up.

Problem is, data collection was a pain because I did it all by hand, copying and pasting. Setting up a script to do it seems annoyingly complex as nba.com and espn.com boxscores have tons of crap in every webpage. Does anyone have any better ideas? Know any sites that do this data collection already? Thanks.



OutThere
Mar 26, 2006, 11:32 PM
You could use regular expressions to parse the websites and it wouldn't be too difficult.

zimv20
Mar 27, 2006, 12:22 AM
do those sites offer XML feeds?

NewbieNerd
Mar 27, 2006, 10:35 AM
do those sites offer XML feeds?

I'm not quite sure what XML feeds are, but the sites (espn.com and nba.com, for instance) do offer RSS feeds, but these are just recaps. Some boxscores, when you look at the source, don't even appear in the html file itself, and copying the text from webpage to a text file doesn't seem to work either. I think my best bet might be just to use the play-by-plays, which offer even more data than the boxscores and are easier to get and parse.

jeremy.king
Mar 27, 2006, 11:42 AM
While it would be nice if statistical information (in a usuable format) was free...Its not.
STATS, Inc. is regarded as THE SOURCE for this type of information.

http://biz.stats.com