Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Jessica Lares

macrumors G3
Original poster
Oct 31, 2009
9,612
1,056
Near Dallas, Texas, USA
So I have about 90 more entries of my walks to go through. I did a few manually, but I bet I could do this automatically with AppleScript or something.

This are the bits of the files I want:

<durationString>30'11"</durationString>
<distanceString>0.96 mi</distanceString>
<pace>31'30" / mi</pace>

I just want a little script that pulls out that middle data in a way so I could copy and paste that into a spreadsheet manually. So the end result would be:

30'11" 0.96 mi 31'30" / mi

Which would automatically format itself to filling three cells that I could clean up so it can be used to calculate averages in total. I'd put the date in separately.

A snippet or link to a tutorial would be appreciated. Something that dumps it all into a plain text file would be fine.
 

MacUser2525

Suspended
Mar 17, 2007
2,097
377
Canada
A simple bash script will do you. I took your data above and put it in an .xml file then used the following commands on it.

Code:
MacUser2525:~$ nano /Volumes/Sea_To_Do/working/nike.xml

MacUser2525:~$ grep duration /Volumes/Sea_To_Do/working/nike.xml
<durationString>30'11"</durationString>

MacUser2525:~$ grep duration /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2
30'11"</durationString

MacUser2525:~$ grep duration /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2 | tr -d \<\/durationString
30'11"

MacUser2525:~$ grep distance /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2 | tr -d \<\/distanceString
0.96 m

MacUser2525:~$ grep pace /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2 | tr -d \<\/pace
31'30"  mi

Now in a bash script.

Code:
MacUser2525:~$ nano /Volumes/Sea_To_Do/working/nike.sh


#!/bin/bash
grep duration /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2 | tr -d \<\/durationString
grep distance /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2 | tr -d \<\/distanceString
grep pace /Volumes/Sea_To_Do/working/nike.xml | cut -d ">" -f 2 | tr -d \<\/pace

MacUser2525:~$ chmod +x /Volumes/Sea_To_Do/working/nike.sh

Run the bash script.

Code:
MacUser2525:~$ /Volumes/Sea_To_Do/working/nike.sh
30'11"
0.96 m
31'30"  mi

It outputs what you need a simple re-direct on the end of the grep expression will put that into a file for you. They will each be on separate line so may as well change that and the script is limited to the one hard coded nike.xml file that needs changing as well. Since I am thinking that you have individual files for each walk from your wording of the question may as well do a for loop to process each file to get the required information too.

Code:
#!/bin/bash

for i in $(ls /Volumes/path/to/nike/xml/*.xml); do
	duration=`grep duration "$i" | cut -d ">" -f 2 | tr -d \<\/durationString`
	distance=`grep distance "$i" | cut -d ">" -f 2 | tr -d \<\/distanceStringm`
	pace=`grep pace "$i" | cut -d ">" -f 2 | tr -d \<\/pacemi`
	if [ -f nike.tsv ] ; then
		printf "$duration\t$distance\t$pace\n" >> nike.tsv
		else
			printf "Duration\tDistance\tPace\n" >> nike.tsv
			printf "$duration\t$distance\t$pace\n" >> nike.tsv
	fi
done

This would give you a nike.tsv or tab separated values file you can import in the current directory it is ran from in Terminal. The script will fail if there are spaces in the names so those need to be removed before running it. Also the path needs to be changed to the directory containing the .xml files in your setup the "ls /Volumes/path/to/nike/xml/*.xml" part. The output of it run on a few files I made here containing the same data as you posted.

Code:
MacUser2525:~$ /Volumes/Sea_To_Do/working/nike.sh
MacUser2525:~$ cat nike.tsv 
Duration	Distance	Pace
30'11"	0.96 	31'30"  
30'11"	0.96 	31'30"  
30'11"	0.96 	31'30"  
30'11"	0.96 	31'30"

As you can see I eliminated the mi in both spots that had them as it is likely to remain constant all the time so you know what it is supposed to be anyways.

Edit: NameChanger an easy to use program to remove the spaces if necessary.

http://www.mrrsoftware.com/MRRSoftware/NameChanger.html
 
Last edited:

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
Here is an alternative, if you navigate to the folder where your xml files are located in the terminal.

Code:
for i in *.xml ; do grep -oP "[^(</*(durationString|distanceString|pace)>)][0-9\'\"mi./\s]+" "$i" | tr '\n' ';' ; echo ; done > nike+.csv

Then

Code:
open nike+.csv -a Numbers

It will create a csv file from your xml files and open it in Numbers, (which I do believe still supports csv). You can also import this manually. This is all quite brittle and may work, or not. :D
 

Jessica Lares

macrumors G3
Original poster
Oct 31, 2009
9,612
1,056
Near Dallas, Texas, USA
Thank you very much! The filenames look like "2013-01-07 17;45;54.xml", so I guess I'm either going to have to rename them to take out the space, yes (and are the ; a problem too?)? Just putting them in " and " doesn't work? Pretty sure I can just use Automator to take care of that anyway. Pretty sure I did that once already.

Could I have all of them write to the same file, and add a line everytime? I would add the date field to the first part of </*(durationString|distanceString|pace)>.
 

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
Thank you very much! The filenames look like "2013-01-07 17;45;54.xml", so I guess I'm either going to have to rename them to take out the space, yes (and are the ; a problem too?)? Just putting them in " and " doesn't work? Pretty sure I can just use Automator to take care of that anyway. Pretty sure I did that once already.

Could I have all of them write to the same file, and add a line everytime? I would add the date field to the first part of </*(durationString|distanceString|pace)>.

The filename should not be a problem, the names are carried in the "$i" variable in the loop. I tried and used the naming style you showed here with the date and I had no problem with it.

For the date field you can use the "$i" variable, and do that before the existing grep part. It's getting a bit ugly so perhaps it's better to add it to a script file if you need the date part as well.

Code:
for i in *.xml ; do echo "$i" | grep -oP "\d+-\d+-\d+" | tr '\n' ';' ; grep -oP "[^(</*(durationString|distanceString|pace)>)][0-9\'\"mi./\s]+" "$i" | tr '\n' ';' ; echo ; done > nike+.csv

BTW, everything should get added to the same file, afaik.
 

MacUser2525

Suspended
Mar 17, 2007
2,097
377
Canada
Thank you very much! The filenames look like "2013-01-07 17;45;54.xml", so I guess I'm either going to have to rename them to take out the space, yes (and are the ; a problem too?)? Just putting them in " and " doesn't work? Pretty sure I can just use Automator to take care of that anyway. Pretty sure I did that once already.

Could I have all of them write to the same file, and add a line everytime? I would add the date field to the first part of </*(durationString|distanceString|pace)>.

Space matters ; does not in fact we will use it to get your date rename the space to ; so you end up with file names like this "2013-01-07;17;45;54.xml". The new version of the script below.

Code:
#!/bin/bash

for i in $(ls /Volumes/path/to/nike/xml/*.xml); do
	filename=$i
	date=`echo $filename | tr -d \/Volumes\/path\to\nike\/xml\/ | cut -d ";" -f 1`
	duration=`grep duration "$i" | cut -d ">" -f 2 | tr -d \<\/durationString`
	distance=`grep distance "$i" | cut -d ">" -f 2 | tr -d \<\/distanceStringm`
	pace=`grep pace "$i" | cut -d ">" -f 2 | tr -d \<\/pacemi`
	if [ -f nike.tsv ] ; then
		printf "$date\t$duration\t$distance\t$pace\n" >> nike.tsv
		else
			printf "Date\tDuration\tDistance\tPace\n" >> nike.tsv
			printf "$date\t$duration\t$distance\t$pace\n" >> nike.tsv
	fi
done

You now need to replace the path two times in this script to get the required data.

----------

Code:
for i in *.xml ; do echo "$i" | grep -oP "\d+-\d+-\d+" | tr '\n' ';' ; grep -oP "[^(</*(durationString|distanceString|pace)>)][0-9\'\"mi./\s]+" "$i" | tr '\n' ';' ; echo ; done > nike+.csv

BTW, everything should get added to the same file, afaik.

Not with a single redirect the > you need two >> to append to a file one simply overwrites the existing file every time the loop runs.
 

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
Not with a single redirect the > you need two >> to append to a file one simply overwrites the existing file every time the loop runs.

You're wrong.

I got the impression that there was a problem to this effect, I think we would need to know more to do anything about it.
 

MacUser2525

Suspended
Mar 17, 2007
2,097
377
Canada
I'm well aware of redirection, but the loop writes to the same file descriptor in one go, ie it's one operation.

If your awareness is anything like your reading comprehension I doubt it. I said nothing about outside the for loop redirection and I for one am going to use the safest always works in all situations option every time.
 

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
If your awareness is anything like your reading comprehension I doubt it. I said nothing about outside the for loop redirection and I for one am going to use the safest always works in all situations option every time.

What are you talking about? You specifically attributed the error to the redirection from the loop, claiming that each iteration would overwrite the last one, which is wrong.

Redirection taken out of this context makes no sense here, it doesn't explain this error, adding a second '>' doesn't do anything in this case, the result is the same.
 

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
Having looked at one of those nike+ xml files, I got the following working. It finds the tag names, then strips the xml tags and adds delimiters for the csv file.

I also noticed that there was a tag called <time> which contained a date and time so it seems like it can be added without having to use the file name, if the format is consistent with what I found.

Code:
for i in *.xml ; do grep -P "(time|durationString|distanceString|pace).+" "$i" | sed 's/<[^>]*>//g' | tr '\n' ';' ; echo ; done > nike+.csv

Code:
open nike+.csv -a Numbers
 

Jessica Lares

macrumors G3
Original poster
Oct 31, 2009
9,612
1,056
Near Dallas, Texas, USA
Having looked at one of those nike+ xml files, I got the following working. It finds the tag names, then strips the xml tags and adds delimiters for the csv file.

I also noticed that there was a tag called <time> which contained a date and time so it seems like it can be added without having to use the file name, if the format is consistent with what I found.

It throws this out 80+ times, but it does make the csv file, only it's just an empty table:

usage: grep [-abcDEFGHhIiJLlmnOoPqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color=when]
[--context[=num]] [--directories=action] [--label] [--line-buffered]
[--null] [pattern] [file ...]
 

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
It throws this out 80+ times, but it does make the csv file, only it's just an empty table:

Make sure you are copy/pasting that whole line in it's entirety. That error message is from grep. I had no problem with it here, and I tried it again just to make sure. The only thing I can test it on is what you have given here, and another part from one of those xml files which I found online. Having said that, the error would not depend on the input, but grep being used wrongly.
 

Jessica Lares

macrumors G3
Original poster
Oct 31, 2009
9,612
1,056
Near Dallas, Texas, USA
Make sure you are copy/pasting that whole line in it's entirety. That error message is from grep. I had no problem with it here, and I tried it again just to make sure. The only thing I can test it on is what you have given here, and another part from one of those xml files which I found online. Having said that, the error would not depend on the input, but grep being used wrongly.

I've attached a screenshot. I'm sure I copied it correctly? :confused:
 

Attachments

  • terminal.png
    terminal.png
    28.8 KB · Views: 102

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
I've attached a screenshot. I'm sure I copied it correctly? :confused:

Ok, let's break it down to the bare minimum. If I just use the grep part, I get this:

Code:
Mac% for i in lastWorkout.xml ; do grep -P "(time|durationString|distanceString|pace)" "$i" ; done
	<time>2006-09-05T12:53:57+01:00</time>
	<durationString>58:58</durationString>
	<distanceString>5.00 mi</distanceString>
	<pace>11:47 min/mi</pace>

You can remove the '.+' part at the end of that regex btw, it's a left over of the first line here.



Edit: If I remove the loop as well I get this:

Code:
Mac% grep -P "(time|durationString|distanceString|pace)" lastWorkout.xml
	<time>2006-09-05T12:53:57+01:00</time>
	<durationString>58:58</durationString>
	<distanceString>5.00 mi</distanceString>
	<pace>11:47 min/mi</pace>
 

Jessica Lares

macrumors G3
Original poster
Oct 31, 2009
9,612
1,056
Near Dallas, Texas, USA
The second loop doesn't work, and either does that first version.

When I take off the done part (just to see what happens), it just gives me:

Code:
>

Is there anyway to check grep and see if it's even correctly set up? I did get it to echo hello.
 

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
The second loop doesn't work, and either does that first version.

When I take off the done part (just to see what happens), it just gives me:

Code:
>

Is there anyway to check grep and see if it's even correctly set up? I did get it to echo hello.

If you remove the done part, then it gives you a prompt to continue with the loop body. 'Done' is what finishes the loop, this enables you to write a loop over several lines. It's unrelated to the grep issue, what happens if you try that last line that starts with 'grep'?

Code:
grep -P "(time|durationString|distanceString|pace)" lastWorkout.xml

(Change the name of the xml file to what ever name you are using).
 

Jessica Lares

macrumors G3
Original poster
Oct 31, 2009
9,612
1,056
Near Dallas, Texas, USA
If you remove the done part, then it gives you a prompt to continue with the loop body. Done is what finishes the loop, this enables you to write a loop over several lines. It's unrelated to the grep issue, what happens if you try that last line that starts with 'grep'?

Makes sense. That last line just gives me the usage parameters again.
 

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
Makes sense. That last line just gives me the usage parameters again.

That's odd. What happens if you do "grep -V"? You may try to replace the "-P" with "-E" the P option is for Perl regular expressions but the odd thing is that it's mentioned in that usage message you get there.

Also long shot but, if you copy the line above again then do:

Code:
pbpaste | od -bc

I get this:

Code:
0000000   147 162 145 160 040 055 120 040 042 050 164 151 155 145 174 144
           g   r   e   p       -   P       "   (   t   i   m   e   |   d
0000020   165 162 141 164 151 157 156 123 164 162 151 156 147 174 144 151
           u   r   a   t   i   o   n   S   t   r   i   n   g   |   d   i
0000040   163 164 141 156 143 145 123 164 162 151 156 147 174 160 141 143
           s   t   a   n   c   e   S   t   r   i   n   g   |   p   a   c
0000060   145 051 042 040 154 141 163 164 127 157 162 153 157 165 164 056
           e   )   "       l   a   s   t   W   o   r   k   o   u   t   .
0000100   170 155 154                                                    
           x   m   l

Which is an octal dump just to make sure there are no weird characters in there (characters up to 177 octal are valid ascii).
 

Jessica Lares

macrumors G3
Original poster
Oct 31, 2009
9,612
1,056
Near Dallas, Texas, USA
That's odd. What happens if you do 'grep -V'? You may try to replace the '-P' with '-E' the P option is for Perl regular expressions but the odd thing is that it's mentioned in that usage message you get there.

grep (BSD grep) 2.5.1-FreeBSD

And yeah, the -E instead of the -P finally gives me the data.
 

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
grep (BSD grep) 2.5.1-FreeBSD

And yeah, the -E instead of the -P finally gives me the data.

Interesting, I get "(GNU grep) 2.5.1". So, even though the same flag are supported "-P" there seems to be an incompatibility.

But with that out of the way, let's go back and try the previous version again, with the -E modification this time.

Code:
for i in *.xml ; do grep -E "(time|durationString|distanceString|pace)" "$i" | sed 's/<[^>]*>//g' | tr '\n' ';' ; echo ; done > nike+.csv

Btw, you can add a second "tr" in there to remove leading tabs if you have them in the xml.
 

Jessica Lares

macrumors G3
Original poster
Oct 31, 2009
9,612
1,056
Near Dallas, Texas, USA
This is what the csv looks like.

And this is from the first cell:

10Step Workout2013-01-07T17:45:54+00:00181074230'11"1.54210.96 mi117418831'30" / mi81021390072.6MD477LL1.0.2 (37A20067)DCYJHBDRF0GP2013-01-07T17:45:54+00:002013-01-07T18:16:05-06:000

The octal dump looked okay BTW. Nothing went over.
 

Attachments

  • Screen Shot 2013-12-30 at 11.18.00 PM.png
    Screen Shot 2013-12-30 at 11.18.00 PM.png
    56.2 KB · Views: 126
Last edited:

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
This is what the csv looks like.

And this is from the first cell:

I think you'll need to attach one of those xml files here then, if you can. This is what I get in Numbers, with three xml files (all the same) which I found online here: http://blog.mattmecham.com/2006/09/05/ipod-training-data-under-the-hood/. It seems to only be a part of the file btw.


Edit: Actually, you can also try to remove the redirection to the .csv file (> nike+.csv) and look what it looks like in the terminal, it should be something like this:

Code:
	2006-09-05T12:53:57+01:00;	58:58;	5.00 mi;	11:47 min/mi;
	2006-09-05T12:53:57+01:00;	58:58;	5.00 mi;	11:47 min/mi;
	2006-09-05T12:53:57+01:00;	58:58;	5.00 mi;	11:47 min/mi;
 

Attachments

  • Skärmavbild 2013-12-31 kl. 06.40.36.png
    Skärmavbild 2013-12-31 kl. 06.40.36.png
    8.9 KB · Views: 110
Last edited:

Jessica Lares

macrumors G3
Original poster
Oct 31, 2009
9,612
1,056
Near Dallas, Texas, USA
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.