Applescript or similar to extract website data into excel, multiple times

Discussion in 'Mac Programming' started by aapl-danial, Jun 17, 2014.

  1. aapl-danial macrumors newbie

    Feb 8, 2013
    Hello all,

    I am looking to write a program to does the following:

    Takes a 10-digit number from a column in excel
    Visits 1 webpage and pastes that 10-digit number into a search field.
    A new page loads from the search (only one)
    And then I need a number extracted off (from the Page Source it's always line 692)
    Then I need that new number pasted into another column of Excel

    How can I go about doing this?

    It needs to be done for roughly 1,000 10-digit numbers, with 1,000 pasted entries back into excel.

    Thank you!
  2. numero macrumors regular

    Jul 23, 2002
    Can you provide a link to the web page and a sample number to enter?
  3. aapl-danial thread starter macrumors newbie

    Feb 8, 2013
    Sure thing,

    It's a tax website, and the numbers are "Parcel ID's" that are entered in the first search box.

    Here is a sample Parcel ID: 15 160 01 105

    Enter that and the site goes to a Display. The number under Tax Information Summery", under "Total Taxes Billed" is $2,049.98. This is the number I need copied and pasted into an excel.

    I poked around in the Page Source under developer, and the $2,049.98 number is listed on line 696.

  4. Freez macrumors newbie

    Feb 9, 2011
    Python, firebug, beautiful soup

    It may take a bit to set up, but google will help.

    I will not write the python script for you, if you get all that installed and you need more help post your code.

    I just completed a 33000 name scrape and python, beautiful soup and firebug were part of my methods.
  5. 960design macrumors 68030

    Apr 17, 2012
    Destin, FL

    Good luck! Ask us if you get stuck. If you get really stuck you can hire someone; I mean that's how a lot of us make our livings.

    You cannot expect the plumber to come over and fix your stuff for free can you? Just like you cannot expect software engineer's to write the code for you. But we will be more than happy to help you get past sticking points.

    PS: I'd use PHP to scrape it.
  6. numero macrumors regular

    Jul 23, 2002
    No error checking. If something craps out with your browser or their server and the timing gets off then everything will stop. I figure for 1,000 items, for free, :) you can figure out how many good retrievals you have and chop the list down and start again.

    1) Change the path and file name where I have "/Users/numero/Desktop/propNumbers.txt" to the path of your file with the property numbers in it. I'm expecting one string of numbers per line.

    2) I write out the property number and the tax amount into the "TaxFile.txt" file on your desktop. If you restart the new data will be appended to this file.

    Let me know if anything gives you trouble.


    set this_file to (((path to desktop folder) as string) & "TaxFile.txt")
    set tabChar to tab -- Have to do this because Safari has 'tab' in its dictionary.
    set numbersFile to paragraphs of (read POSIX file "/Users/numero/Desktop/propNumbers.txt")
    tell application "Safari"
    	make new document
    	set safariWindow to front window
    	repeat with nextLine in numbersFile
    		if length of nextLine is greater than 0 then
    			set the clipboard to nextLine
    			set URL of tab 1 of safariWindow to ""
    			delay 2 -- let the page load
    			tell application "System Events"
    				keystroke tab
    				keystroke "v" using command down
    				delay 1
    				key code 36
    				key code 36 -- 2 returns because on my tests my browser was remembering the numbers I used over and it needed a second 'return' to get past the autofill.
    				delay 2 -- let the page load
    			end tell
    			set pageText to text of current tab of safariWindow
    			set found to false
    			set i to 70 -- Should be line 78 or 79. Start a ways up just to be sure to catch the line we want.
    			set totalTaxesBilled to 0
    			repeat while not found
    				if paragraph i of pageText contains "Total Taxes Billed" then
    					set totalTaxesBilled to (word 5 of paragraph i of pageText)
    					my write_to_file(nextLine & tabChar & totalTaxesBilled & return, this_file, true)
    					set found to true
    				end if
    				set i to i + 1
    			end repeat
    		end if
    	end repeat
    end tell
    on write_to_file(this_data, target_file, append_data)
    		set the target_file to the target_file as string
    		set the open_target_file to open for access file target_file with write permission
    		if append_data is false then set eof of the open_target_file to 0
    		write this_data to the open_target_file starting at eof
    		close access the open_target_file
    		return true
    	on error
    			close access file target_file
    		end try
    		return false
    	end try
    end write_to_file

Share This Page