Data Collection

Discussion in 'Mac Programming' started by NewbieNerd, Mar 26, 2006.

  1. macrumors 6502a

    NewbieNerd

    Joined:
    Sep 22, 2005
    Location:
    Chicago, IL
    #1
    I'm thinking about setting up some machine learning algorithms, particularly to use with nba and college basketball prediction. I did a couple of them last year for a class with the men's NCAA tournament and it worked quite well, so I'm interested in keeping it up.

    Problem is, data collection was a pain because I did it all by hand, copying and pasting. Setting up a script to do it seems annoyingly complex as nba.com and espn.com boxscores have tons of crap in every webpage. Does anyone have any better ideas? Know any sites that do this data collection already? Thanks.
     
  2. macrumors 603

    OutThere

    Joined:
    Dec 19, 2002
    Location:
    NYC
    #2
    You could use regular expressions to parse the websites and it wouldn't be too difficult.
     
  3. macrumors 601

    zimv20

    Joined:
    Jul 18, 2002
    Location:
    chicago
    #3
    do those sites offer XML feeds?
     
  4. thread starter macrumors 6502a

    NewbieNerd

    Joined:
    Sep 22, 2005
    Location:
    Chicago, IL
    #4
    I'm not quite sure what XML feeds are, but the sites (espn.com and nba.com, for instance) do offer RSS feeds, but these are just recaps. Some boxscores, when you look at the source, don't even appear in the html file itself, and copying the text from webpage to a text file doesn't seem to work either. I think my best bet might be just to use the play-by-plays, which offer even more data than the boxscores and are easier to get and parse.
     
  5. macrumors 603

    jeremy.king

    Joined:
    Jul 23, 2002
    Location:
    Fuquay Varina, NC
    #5
    While it would be nice if statistical information (in a usuable format) was free...Its not.
    STATS, Inc. is regarded as THE SOURCE for this type of information.

    http://biz.stats.com
     

Share This Page