I want to dump Nike+ (XML) data from my Nano into a spreadsheet, best way to do it?

Discussion in 'Mac Programming' started by Jessica Lares, Dec 28, 2013.

  1. Jessica Lares macrumors G3

    Jessica Lares

    Joined:
    Oct 31, 2009
    Location:
    Near Dallas, Texas, USA
    #1
    So I have about 90 more entries of my walks to go through. I did a few manually, but I bet I could do this automatically with AppleScript or something.

    This are the bits of the files I want:

    <durationString>30'11"</durationString>
    <distanceString>0.96 mi</distanceString>
    <pace>31'30" / mi</pace>

    I just want a little script that pulls out that middle data in a way so I could copy and paste that into a spreadsheet manually. So the end result would be:

    30'11" 0.96 mi 31'30" / mi

    Which would automatically format itself to filling three cells that I could clean up so it can be used to calculate averages in total. I'd put the date in separately.

    A snippet or link to a tutorial would be appreciated. Something that dumps it all into a plain text file would be fine.
     
  2. MacUser2525, Dec 28, 2013
    Last edited: Dec 28, 2013

    MacUser2525 macrumors 68000

    MacUser2525

    Joined:
    Mar 17, 2007
    Location:
    Canada
    #2
    A simple bash script will do you. I took your data above and put it in an .xml file then used the following commands on it.

    Code:
    MacUser2525:~$ nano /Volumes/Sea_To_Do/working/nike.xml
    
    MacUser2525:~$ grep duration /Volumes/Sea_To_Do/working/nike.xml
    <durationString>30'11"</durationString>
    
    MacUser2525:~$ grep duration /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2
    30'11"</durationString
    
    MacUser2525:~$ grep duration /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2 | tr -d \<\/durationString
    30'11"
    
    MacUser2525:~$ grep distance /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2 | tr -d \<\/distanceString
    0.96 m
    
    MacUser2525:~$ grep pace /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2 | tr -d \<\/pace
    31'30"  mi
    
    Now in a bash script.

    Code:
    MacUser2525:~$ nano /Volumes/Sea_To_Do/working/nike.sh
    
    
    #!/bin/bash
    grep duration /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2 | tr -d \<\/durationString
    grep distance /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2 | tr -d \<\/distanceString
    grep pace /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2 | tr -d \<\/pace
    
    MacUser2525:~$ chmod +x /Volumes/Sea_To_Do/working/nike.sh
    
    Run the bash script.

    Code:
    MacUser2525:~$ /Volumes/Sea_To_Do/working/nike.sh
    30'11"
    0.96 m
    31'30"  mi
    
    It outputs what you need a simple re-direct on the end of the grep expression will put that into a file for you. They will each be on separate line so may as well change that and the script is limited to the one hard coded nike.xml file that needs changing as well. Since I am thinking that you have individual files for each walk from your wording of the question may as well do a for loop to process each file to get the required information too.

    Code:
    #!/bin/bash
    
    for i in $(ls /Volumes/path/to/nike/xml/*.xml); do
    	duration=`grep duration "$i" | cut -d ">" -f 2 | tr -d \<\/durationString`
    	distance=`grep distance "$i" | cut -d ">" -f 2 | tr -d \<\/distanceStringm`
    	pace=`grep pace "$i" | cut -d ">" -f 2 | tr -d \<\/pacemi`
    	if [ -f nike.tsv ] ; then
    		printf "$duration\t$distance\t$pace\n" >> nike.tsv
    		else
    			printf "Duration\tDistance\tPace\n" >> nike.tsv
    			printf "$duration\t$distance\t$pace\n" >> nike.tsv
    	fi
    done
    
    This would give you a nike.tsv or tab separated values file you can import in the current directory it is ran from in Terminal. The script will fail if there are spaces in the names so those need to be removed before running it. Also the path needs to be changed to the directory containing the .xml files in your setup the "ls /Volumes/path/to/nike/xml/*.xml" part. The output of it run on a few files I made here containing the same data as you posted.

    Code:
    MacUser2525:~$ /Volumes/Sea_To_Do/working/nike.sh
    MacUser2525:~$ cat nike.tsv 
    Duration	Distance	Pace
    30'11"	0.96 	31'30"  
    30'11"	0.96 	31'30"  
    30'11"	0.96 	31'30"  
    30'11"	0.96 	31'30"  
    
    As you can see I eliminated the mi in both spots that had them as it is likely to remain constant all the time so you know what it is supposed to be anyways.

    Edit: NameChanger an easy to use program to remove the spaces if necessary.

    http://www.mrrsoftware.com/MRRSoftware/NameChanger.html
     
  3. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #3
    Here is an alternative, if you navigate to the folder where your xml files are located in the terminal.

    Code:
    for i in *.xml ; do grep -oP "[^(</*(durationString|distanceString|pace)>)][0-9\'\"mi./\s]+" "$i" | tr '\n' ';' ; echo ; done > nike+.csv
    
    Then

    Code:
    open nike+.csv -a Numbers
    
    It will create a csv file from your xml files and open it in Numbers, (which I do believe still supports csv). You can also import this manually. This is all quite brittle and may work, or not. :D
     
  4. Jessica Lares thread starter macrumors G3

    Jessica Lares

    Joined:
    Oct 31, 2009
    Location:
    Near Dallas, Texas, USA
    #4
    Thank you very much! The filenames look like "2013-01-07 17;45;54.xml", so I guess I'm either going to have to rename them to take out the space, yes (and are the ; a problem too?)? Just putting them in " and " doesn't work? Pretty sure I can just use Automator to take care of that anyway. Pretty sure I did that once already.

    Could I have all of them write to the same file, and add a line everytime? I would add the date field to the first part of </*(durationString|distanceString|pace)>.
     
  5. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #5
    The filename should not be a problem, the names are carried in the "$i" variable in the loop. I tried and used the naming style you showed here with the date and I had no problem with it.

    For the date field you can use the "$i" variable, and do that before the existing grep part. It's getting a bit ugly so perhaps it's better to add it to a script file if you need the date part as well.

    Code:
    for i in *.xml ; do echo "$i" | grep -oP "\d+-\d+-\d+" | tr '\n' ';' ; grep -oP "[^(</*(durationString|distanceString|pace)>)][0-9\'\"mi./\s]+" "$i" | tr '\n' ';' ; echo ; done > nike+.csv
    
    BTW, everything should get added to the same file, afaik.
     
  6. MacUser2525 macrumors 68000

    MacUser2525

    Joined:
    Mar 17, 2007
    Location:
    Canada
    #6
    Space matters ; does not in fact we will use it to get your date rename the space to ; so you end up with file names like this "2013-01-07;17;45;54.xml". The new version of the script below.

    Code:
    #!/bin/bash
    
    for i in $(ls /Volumes/path/to/nike/xml/*.xml); do
    	filename=$i
    	date=`echo $filename | tr -d \/Volumes\/path\to\nike\/xml\/ | cut -d ";" -f 1`
    	duration=`grep duration "$i" | cut -d ">" -f 2 | tr -d \<\/durationString`
    	distance=`grep distance "$i" | cut -d ">" -f 2 | tr -d \<\/distanceStringm`
    	pace=`grep pace "$i" | cut -d ">" -f 2 | tr -d \<\/pacemi`
    	if [ -f nike.tsv ] ; then
    		printf "$date\t$duration\t$distance\t$pace\n" >> nike.tsv
    		else
    			printf "Date\tDuration\tDistance\tPace\n" >> nike.tsv
    			printf "$date\t$duration\t$distance\t$pace\n" >> nike.tsv
    	fi
    done
    
    You now need to replace the path two times in this script to get the required data.

    ----------

    Not with a single redirect the > you need two >> to append to a file one simply overwrites the existing file every time the loop runs.
     
  7. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #7
    You're wrong.

    I got the impression that there was a problem to this effect, I think we would need to know more to do anything about it.
     
  8. MacUser2525 macrumors 68000

    MacUser2525

    Joined:
    Mar 17, 2007
    Location:
    Canada
    #8
  9. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #9
    I'm well aware of redirection, but the loop writes to the same file descriptor in one go, ie it's one operation. If you do not want to test this by actually create two files, this confirms it as well.

    Code:
    for i in {1..10} ; do
        echo $i
    done > test
    
     
  10. MacUser2525 macrumors 68000

    MacUser2525

    Joined:
    Mar 17, 2007
    Location:
    Canada
    #10
    If your awareness is anything like your reading comprehension I doubt it. I said nothing about outside the for loop redirection and I for one am going to use the safest always works in all situations option every time.
     
  11. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #11
    What are you talking about? You specifically attributed the error to the redirection from the loop, claiming that each iteration would overwrite the last one, which is wrong.

    Redirection taken out of this context makes no sense here, it doesn't explain this error, adding a second '>' doesn't do anything in this case, the result is the same.
     
  12. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #12
    Having looked at one of those nike+ xml files, I got the following working. It finds the tag names, then strips the xml tags and adds delimiters for the csv file.

    I also noticed that there was a tag called <time> which contained a date and time so it seems like it can be added without having to use the file name, if the format is consistent with what I found.

    Code:
    for i in *.xml ; do grep -P "(time|durationString|distanceString|pace).+" "$i" | sed 's/<[^>]*>//g' | tr '\n' ';' ; echo ; done > nike+.csv
    
    Code:
    open nike+.csv -a Numbers
    
     
  13. Jessica Lares thread starter macrumors G3

    Jessica Lares

    Joined:
    Oct 31, 2009
    Location:
    Near Dallas, Texas, USA
    #13
    It throws this out 80+ times, but it does make the csv file, only it's just an empty table:

     
  14. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #14
    Make sure you are copy/pasting that whole line in it's entirety. That error message is from grep. I had no problem with it here, and I tried it again just to make sure. The only thing I can test it on is what you have given here, and another part from one of those xml files which I found online. Having said that, the error would not depend on the input, but grep being used wrongly.
     
  15. Jessica Lares thread starter macrumors G3

    Jessica Lares

    Joined:
    Oct 31, 2009
    Location:
    Near Dallas, Texas, USA
    #15
    I've attached a screenshot. I'm sure I copied it correctly? :confused:
     

    Attached Files:

  16. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #16
    Ok, let's break it down to the bare minimum. If I just use the grep part, I get this:

    Code:
    Mac% for i in lastWorkout.xml ; do grep -P "(time|durationString|distanceString|pace)" "$i" ; done
    	<time>2006-09-05T12:53:57+01:00</time>
    	<durationString>58:58</durationString>
    	<distanceString>5.00 mi</distanceString>
    	<pace>11:47 min/mi</pace>
    
    You can remove the '.+' part at the end of that regex btw, it's a left over of the first line here.



    Edit: If I remove the loop as well I get this:

    Code:
    Mac% grep -P "(time|durationString|distanceString|pace)" lastWorkout.xml
    	<time>2006-09-05T12:53:57+01:00</time>
    	<durationString>58:58</durationString>
    	<distanceString>5.00 mi</distanceString>
    	<pace>11:47 min/mi</pace>
    
     
  17. Jessica Lares thread starter macrumors G3

    Jessica Lares

    Joined:
    Oct 31, 2009
    Location:
    Near Dallas, Texas, USA
    #17
    The second loop doesn't work, and either does that first version.

    When I take off the done part (just to see what happens), it just gives me:

    Code:
    >
    Is there anyway to check grep and see if it's even correctly set up? I did get it to echo hello.
     
  18. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #18
    If you remove the done part, then it gives you a prompt to continue with the loop body. 'Done' is what finishes the loop, this enables you to write a loop over several lines. It's unrelated to the grep issue, what happens if you try that last line that starts with 'grep'?

    Code:
    grep -P "(time|durationString|distanceString|pace)" lastWorkout.xml
    (Change the name of the xml file to what ever name you are using).
     
  19. Jessica Lares thread starter macrumors G3

    Jessica Lares

    Joined:
    Oct 31, 2009
    Location:
    Near Dallas, Texas, USA
    #19
    Makes sense. That last line just gives me the usage parameters again.
     
  20. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #20
    That's odd. What happens if you do "grep -V"? You may try to replace the "-P" with "-E" the P option is for Perl regular expressions but the odd thing is that it's mentioned in that usage message you get there.

    Also long shot but, if you copy the line above again then do:

    Code:
    pbpaste | od -bc
    
    I get this:

    Code:
    0000000   147 162 145 160 040 055 120 040 042 050 164 151 155 145 174 144
               g   r   e   p       -   P       "   (   t   i   m   e   |   d
    0000020   165 162 141 164 151 157 156 123 164 162 151 156 147 174 144 151
               u   r   a   t   i   o   n   S   t   r   i   n   g   |   d   i
    0000040   163 164 141 156 143 145 123 164 162 151 156 147 174 160 141 143
               s   t   a   n   c   e   S   t   r   i   n   g   |   p   a   c
    0000060   145 051 042 040 154 141 163 164 127 157 162 153 157 165 164 056
               e   )   "       l   a   s   t   W   o   r   k   o   u   t   .
    0000100   170 155 154                                                    
               x   m   l
    
    Which is an octal dump just to make sure there are no weird characters in there (characters up to 177 octal are valid ascii).
     
  21. Jessica Lares thread starter macrumors G3

    Jessica Lares

    Joined:
    Oct 31, 2009
    Location:
    Near Dallas, Texas, USA
    #21
    grep (BSD grep) 2.5.1-FreeBSD

    And yeah, the -E instead of the -P finally gives me the data.
     
  22. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #22
    Interesting, I get "(GNU grep) 2.5.1". So, even though the same flag are supported "-P" there seems to be an incompatibility.

    But with that out of the way, let's go back and try the previous version again, with the -E modification this time.

    Code:
    for i in *.xml ; do grep -E "(time|durationString|distanceString|pace)" "$i" | sed 's/<[^>]*>//g' | tr '\n' ';' ; echo ; done > nike+.csv
    
    Btw, you can add a second "tr" in there to remove leading tabs if you have them in the xml.
     
  23. Jessica Lares, Dec 30, 2013
    Last edited: Dec 30, 2013

    Jessica Lares thread starter macrumors G3

    Jessica Lares

    Joined:
    Oct 31, 2009
    Location:
    Near Dallas, Texas, USA
    #23
    This is what the csv looks like.

    And this is from the first cell:

    The octal dump looked okay BTW. Nothing went over.
     

    Attached Files:

  24. subsonix, Dec 30, 2013
    Last edited: Dec 30, 2013

    subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #24
    I think you'll need to attach one of those xml files here then, if you can. This is what I get in Numbers, with three xml files (all the same) which I found online here: http://blog.mattmecham.com/2006/09/05/ipod-training-data-under-the-hood/. It seems to only be a part of the file btw.


    Edit: Actually, you can also try to remove the redirection to the .csv file (> nike+.csv) and look what it looks like in the terminal, it should be something like this:

    Code:
    	2006-09-05T12:53:57+01:00;	58:58;	5.00 mi;	11:47 min/mi;
    	2006-09-05T12:53:57+01:00;	58:58;	5.00 mi;	11:47 min/mi;
    	2006-09-05T12:53:57+01:00;	58:58;	5.00 mi;	11:47 min/mi;
    
     

    Attached Files:

  25. Jessica Lares thread starter macrumors G3

    Jessica Lares

    Joined:
    Oct 31, 2009
    Location:
    Near Dallas, Texas, USA
    #25

Share This Page