How to merge many XML files into one

Discussion in 'Mac Programming' started by mrindia, Jul 15, 2013.

  1. mrindia, Jul 15, 2013
    Last edited: Jul 17, 2013

    macrumors newbie

    Joined:
    May 30, 2013
    #1
    Hi: I got a small project to combine many XML files into one and convert the combined XML file in Excel using AppleScript. My XML files look like this:

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <Metadataobject>
    	<from>horsten.jock@edienpartner-int.com</from>
    	<jobname>B3_IM09MBDUF</jobname>
    	<pages>2</pages>
    	<priority>3</priority>
    	<timezone>CEST</timezone>
    	<year>2013</year>
    	<month>7</month>
    	<day>15</day>
    	<hour>11</hour>
    </Metadataobject>
    and like this...

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <Metadataobject>
    	<from>frank.k@mrtner-int.com</from>
    	<jobname>P1_FR1330G006007_Kate_van der Vaart</jobname>
    	<pages>2</pages>
    	<priority>1</priority>
    	<timezone>CEST</timezone>
    	<year>2013</year>
    	<month>7</month>
    	<day>12</day>
    	<hour>16</hour>
    </Metadataobject>
    I get many XML files like this. And I want them to be combined and shown like this:

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <Metadataobject>
    	<job id="1">
    	<from>thck@rtner-int.com</from>
    	<jobname>B3_IM09MBDUF</jobname>
    	<pages>2</pages>
    	<priority>3</priority>
    	<timezone>CEST</timezone>
    	<year>2013</year>
    	<month>7</month>
    	<day>15</day>
    	<hour>11</hour>
    	</job>
    	<job id="1">
    	<from>flk@mrtner-int.com</from>
    	<jobname>P1_FR1330G006007_Kate_van der Vaart</jobname>
    	<pages>2</pages>
    	<priority>1</priority>
    	<timezone>CEST</timezone>
    	<year>2013</year>
    	<month>7</month>
    	<day>12</day>
    	<hour>16</hour>
    	</job>
    </Metadataobject>
    And finally the combined XML file converts in Excel sheet with column headings "Job ID", "From", "Job Name" and so on...

    Thanks
     
  2. r0k
    macrumors 68040

    r0k

    Joined:
    Mar 3, 2008
    Location:
    Detroit
    #2
    First of all, please go back and edit your post to remove "real" email addresses. We don't need those to help you but the spambots are happy to find those email addresses "in the clear."

    If I were doing this, I would do it in terminal, with something like this:

    I would cd to the folder with all the xml files,then I would...

    Code:
    cat *.xml | egrep -v "Metadataobject" | egrep -v "xml version" > new.xml
    cat firstfilename.xml | grep "xml version" > alldone.xml
    cat firstfilename.xml | grep "<Metadataobject>" >> alldone.xml
    cat new.xml >> alldone.xml
    cat firstfilename.xml | grep "</Metadataobject>" >> alldone.xml
    
    This is using standard Unix command line tools.
    The cat command simply sends all the files onto the pipe.
    The pipe command, | says take the output of the previous step and pass it as input to the next step.
    The egrep -v command says omit all lines containing Metadataobject (both the one before and after the stuff you want).
    The next egrep -v command gets rid of xml version lines
    The resulting file now has everything in one long block and you can prepend the header stuff before it and the /Metadata object line after it.

    The next lines are more of the same so you wind up with the file looking like your example. Of course you know the filenames so you would change firstfilename.xml in my example above to one of the files you are using.

    One thing to consider. Maybe you should keep the Metadataobject lines. You may only need to get rid of the "xml version..." headers, so the following command would work...

    Code:
    cat *.xml | egrep -v "xml version" > new.xml
    cat firstfilename.xml | grep "xml version" > alldone.xml
    cat new.xml >> alldone.xml
    Again you know the filenames so you would change firstfilename.xml in my example above to one of the files you are using.

    Hope this helps.
     
  3. jahala, Jul 17, 2013
    Last edited: Jul 17, 2013

    macrumors regular

    Joined:
    Feb 7, 2008
    #3
    Good first shot, but you failed to notice that he added <job> and </job> to around the contents of each file. The following works to get a combined xml file.

    head -n 2 firstfile.xml > test.res; for i in *.xml; do echo '<job id="1">' >> test.res; tail -n +3 $i | grep -v 'Metadataobject' >> test.res; echo '</job>' >> test.res; done; echo '</Metadataobject>' >> test.res

    The output file is test.res. I am not yet sure what to do to make it open in excel. I imagine it has something to do with the first line.
     

Share This Page