View Full Version : Simple Data Scraping from Table?
kwjohns
Feb 19, 2009, 02:51 PM
I need a method for scraping some statistical data out of tables on a website and place it onto another website that I wish to optimize for the iPhone. This information gets updated frequently, so it can't be a one time thing, but something that gets pulled everytime the iPhone optimized page is opened. Does anyone have any ideas for how to go about that? If possible, I'd prefer it to not be php as I have 0 experience with it.
I have played around a little bit with Google Spreadsheet which has a simple ImportHTML() function so I imagine there should be a fairly easy option out there for doing it onto a website.
angelwatt
Feb 19, 2009, 03:20 PM
Are you the owner of both sites? If you have access to both you can use the data source for both sites.
If you're not willing to use server side languages you'll be limited. The only other option is JavaScript and so will only work when a user has that enabled and will also hurt your search rankings as web crawlers don't use JavaScript either.
To help any further we need to see the table in question.
kwjohns
Feb 19, 2009, 04:33 PM
Are you the owner of both sites? If you have access to both you can use the data source for both sites.
If you're not willing to use server side languages you'll be limited. The only other option is JavaScript and so will only work when a user has that enabled and will also hurt your search rankings as web crawlers don't use JavaScript either.
To help any further we need to see the table in question.
Nope, not the owner of the site I'm wanting to pull data from so I don't have access to the databases. Here is an example of a table I am wanting to pull from:
http://tinyurl.com/cbvenh
I'm not worried about web crawlers or anything. This is just going to be pages I care to be accessible in an iPhone app I'm developing.
angelwatt
Feb 19, 2009, 05:29 PM
There's a ton of embedded tables on there. Makes it hard to scrap. For anyone wanting to look into a solution here, here's a piece to get you to the right table (I think).
document.getElementById('MainColContainer').getElementsByTagName('table')[1];
From there you could just copy the nodes and put them in a local table. The real trick then is to grab the HTML from the page so you can actually parse it. I haven't done that so don't know how off hand and don't have time to look it up currently, so if anyone wants to work on this at all feel free to contribute. I'll look into more at a later time.
kwjohns
Feb 19, 2009, 05:38 PM
The ImportHTML() function/formula I mentioned earlier went something like
=ImportHTML("http://url", "table", 8)
8 being the 8th table in the page that it pulls the data from. Then it filled the spreadsheet with all of the data how it's displayed in the table.
Example:
http://spreadsheets.google.com/pub?key=pUTsDUbsSo3QrS7wcQXEYbw
And then to go even further, single cells could be populated by adding on Index():
Index(Importhtml("http://www.okstate.com/SportSelect.dbml?DB_OEM_ID=200&KEY=&SPID=145&SPSID=1466", "table", 8),3,4)
geekindisguise
Feb 19, 2009, 09:28 PM
http://tinyurl.com/cbvenh
Wow, I didn't think I would ever meet someone on here who also like OK State. I am guessing you do since that is the site your pulling from. Could be wrong. But, well, IDK... :rolleyes:
angelwatt
Feb 19, 2009, 09:45 PM
The ImportHTML() function/formula I mentioned earlier went something like
=ImportHTML("http://url", "table", 8)
OK, so you have a way to do the scraping. I'm not really sure what else you need.
kwjohns
Feb 19, 2009, 10:39 PM
OK, so you have a way to do the scraping. I'm not really sure what else you need.
Well that's a method that puts it into a Google spreadsheet on Google's site. I don't have a way to play around with the data. I need a way to be able to format the information how I'd like, put it on a prettier page, etc.
kwjohns
Feb 19, 2009, 10:39 PM
Wow, I didn't think I would ever meet someone on here who also like OK State. I am guessing you do since that is the site your pulling from. Could be wrong. But, well, IDK... :rolleyes:
Ya, I'm an alum.
vBulletin® v3.6.10, Copyright ©2000-2009, Jelsoft Enterprises Ltd.