Disclaimer: Consider me a beginner in XSL.
I want to datamine some information from an html page I have. It's an html page for a web forum (not macrumors) and I want to create an xml representation of the posts in a thread. So basically I want to strip out all the nonessential html stuff and just extract the username, date posted, and message.
I have used the tidy utility to make sure the page is well formed xhtml. What I need help in is setting up the xsl file. Particularly the xpath used to match elements within the html.
So if you can imagine a forum page has many nested tables. In the example below, I want to pick out the <tr> elements of the inner most table. Any suggestion on how to do that?
I want to datamine some information from an html page I have. It's an html page for a web forum (not macrumors) and I want to create an xml representation of the posts in a thread. So basically I want to strip out all the nonessential html stuff and just extract the username, date posted, and message.
I have used the tidy utility to make sure the page is well formed xhtml. What I need help in is setting up the xsl file. Particularly the xpath used to match elements within the html.
So if you can imagine a forum page has many nested tables. In the example below, I want to pick out the <tr> elements of the inner most table. Any suggestion on how to do that?
Code:
<html>
<body>
<center>
<table>
<tr>
<td>
<table>
<tr>
<td>
<table>
<tr></tr>
<tr></tr>
<tr></tr>
<tr></tr>
.....
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</center>
</body>
</html>