Need help filtering out some information from a webpage using an xpath query

Discussion in 'iOS Programming' started by cstromme, Aug 9, 2009.

  1. cstromme macrumors regular

    Joined:
    Feb 26, 2007
    #1
    So I'm using libxml2 here and doing XPath queries to filter out information from some web-pages, everything's been going swell so far, but I've run into a problem on a specific page.

    This is the part of the page I'm trying to filter out:
    Code:
    <table width="252" border="0" cellspacing="0" cellpadding="5">
    <tr>
    <td width="116" align="left" valign="top">
    <h3>Original tittel</h3><br />Coco avant Chanel<br />
    <h3>Genre</h3><br />Drama<br />
    <h3>Nasjonalitet</h3><br />FRA<br />
    <h3>Sensur</h3><br />Tillatt for alle<br />
    <h3>Regi</h3><br />Anne Fontaine<br />
    <h3>Medvirkende</h3><br />Audrey Tautou, Benoit Poelvoorde, Emmanuelle Devos, Marie Gillain<br />
    <h3>Lengde</h3><br />1 t. 50 min.<br />
    <h3>FilmbyrÂ</h3><br />SF Norge<br />
    <td width="116" align="left" valign="top">
    <h3>Publikum mener </h3>
    <table width="100%" border="0" cellspacing="0" cellpadding="0">
    <tr>
    <td width="20">
    <img src="http://www.oslokino.no/template/static/gfx/hoyre_kolonne/terninger/5.gif" alt="Terningkast" width="16" height="16" vspace="4" /></td>
    <td align="left" valign="middle"> </td>
    </tr>
    </table>
    What I want to get at is the information inside the <br /> tags, but this has proven to be quite difficult.

    With the xpath query string:
    Code:
    @"//td[@width='116' and @align='left' and @valign='top']/h3";
    I am capable of getting this:
    Code:
    (
            {
            nodeContent = Genre;
            nodeName = h3;
        },
            {
            nodeContent = Nasjonalitet;
            nodeName = h3;
        },
            {
            nodeContent = Regi;
            nodeName = h3;
        },
            {
            nodeContent = Produsent;
            nodeName = h3;
        },
            {
            nodeContent = Medvirkende;
            nodeName = h3;
        },
            {
            nodeContent = Musikk;
            nodeName = h3;
        },
            {
            nodeContent = Lengde;
            nodeName = h3;
        },
            {
            nodeContent = "Publikum mener";
            nodeName = h3;
        },
            {
            nodeContent = "Hva mener du?";
            nodeName = h3;
        }
    )
    
    But this is really more what I would like to have as my nodeName, and then the nodeContent would be, using Genre as example, Coco avant Chanel.

    Anybody here that can help me out with this? I've been reading examples and XPath tutorials for hours now, and I still can't quite find a way to do this.
     
  2. dejo Moderator

    dejo

    Staff Member

    Joined:
    Sep 2, 2004
    Location:
    The Centennial State
    #2
    Based on the structure of that HTML, I would figure that the "content" of the node for Genre would be "Drama". "Coco avant Chanel" would seem to be the content for the 'Original tittel' node.

    As for the XPath side, I'm sorry; I don't think I can help you with that.
     
  3. cstromme thread starter macrumors regular

    Joined:
    Feb 26, 2007

Share This Page