Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

cstromme

macrumors regular
Original poster
Feb 26, 2007
162
0
So I'm using libxml2 here and doing XPath queries to filter out information from some web-pages, everything's been going swell so far, but I've run into a problem on a specific page.

This is the part of the page I'm trying to filter out:
Code:
<table width="252" border="0" cellspacing="0" cellpadding="5">
<tr>
<td width="116" align="left" valign="top">
<h3>Original tittel</h3><br />Coco avant Chanel<br />
<h3>Genre</h3><br />Drama<br />
<h3>Nasjonalitet</h3><br />FRA<br />
<h3>Sensur</h3><br />Tillatt for alle<br />
<h3>Regi</h3><br />Anne Fontaine<br />
<h3>Medvirkende</h3><br />Audrey Tautou, Benoit Poelvoorde, Emmanuelle Devos, Marie Gillain<br />
<h3>Lengde</h3><br />1 t. 50 min.<br />
<h3>FilmbyrÂ</h3><br />SF Norge<br />
<td width="116" align="left" valign="top">
<h3>Publikum mener </h3>
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="20">
<img src="http://www.oslokino.no/template/static/gfx/hoyre_kolonne/terninger/5.gif" alt="Terningkast" width="16" height="16" vspace="4" /></td>
<td align="left" valign="middle"> </td>
</tr>
</table>

What I want to get at is the information inside the <br /> tags, but this has proven to be quite difficult.

With the xpath query string:
Code:
@"//td[@width='116' and @align='left' and @valign='top']/h3";

I am capable of getting this:
Code:
(
        {
        nodeContent = Genre;
        nodeName = h3;
    },
        {
        nodeContent = Nasjonalitet;
        nodeName = h3;
    },
        {
        nodeContent = Regi;
        nodeName = h3;
    },
        {
        nodeContent = Produsent;
        nodeName = h3;
    },
        {
        nodeContent = Medvirkende;
        nodeName = h3;
    },
        {
        nodeContent = Musikk;
        nodeName = h3;
    },
        {
        nodeContent = Lengde;
        nodeName = h3;
    },
        {
        nodeContent = "Publikum mener";
        nodeName = h3;
    },
        {
        nodeContent = "Hva mener du?";
        nodeName = h3;
    }
)

But this is really more what I would like to have as my nodeName, and then the nodeContent would be, using Genre as example, Coco avant Chanel.

Anybody here that can help me out with this? I've been reading examples and XPath tutorials for hours now, and I still can't quite find a way to do this.
 
But this is really more what I would like to have as my nodeName, and then the nodeContent would be, using Genre as example, Coco avant Chanel.
Based on the structure of that HTML, I would figure that the "content" of the node for Genre would be "Drama". "Coco avant Chanel" would seem to be the content for the 'Original tittel' node.

As for the XPath side, I'm sorry; I don't think I can help you with that.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.