Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Thethuthinang

macrumors member
Original poster
Jan 3, 2011
39
0
I am trying to search the DOMDocument for a webpage (Wikipedia). I have subclassed WebView and have obtained a DOMDocument object by using:
Code:
[[self mainFrame] DOMDocument]

However, I am unable to search the DOM tree. In particular, the code:
Code:
[[[[self mainFrame] DOMDocument] childNodes] length]

returns 1, which is not the number of nodes I am expecting. Searching that child node:
Code:
[[[[self mainFrame] DOMDocument] childNodes] item:0]
brings me to a dead end immediately.

Similar code inserted in the XCode example DOMTreeView returns 2.

Why the discrepancy?
 

chown33

Moderator
Staff member
Aug 9, 2009
10,750
8,422
A sea of green
Instead of blindly imposing your expectations on the nodes, you should add some diagnostic code that traverses the nodes and tells you what they are. Then you'll be able to tell us what you actually get as a DOMDocument.

You might also want to familiarize yourself with the Document Object Model:
http://www.w3schools.com/htmldom/default.asp

Of note: the document node has 1 child, the root element.

You should also tell us where the DOMDocument was loaded from. If there was an error of some kind, the DOM you actually get may not be what you expect.
 

Thethuthinang

macrumors member
Original poster
Jan 3, 2011
39
0
Instead of blindly imposing your expectations on the nodes, you should add some diagnostic code that traverses the nodes and tells you what they are. Then you'll be able to tell us what you actually get as a DOMDocument.

You might also want to familiarize yourself with the Document Object Model:
http://www.w3schools.com/htmldom/default.asp

Oh, I ran diagnostics. Why do you assume I didn't? :(

Of note: the document node has 1 child, the root element.

Here is the start of the tree displayed by DOMTreeView:

V #document
........html
......V HTML
..........> HEAD
..........> BODY

Does this show that the document node has two children? The only one (of the children of "#document") that I can access in my program is the one labeled above as "html".

You should also tell us where the DOMDocument was loaded from. If there was an error of some kind, the DOM you actually get may not be what you expect.

The code I use to get the starting node is:
Code:
[[self mainFrame] loadRequest:[NSURLRequest requestWithURL:[NSURL URLWithString:@"http://en.wikipedia.org/wiki/Main_Page"]]];
node = [[self mainFrame] DOMDocument];
 

chown33

Moderator
Staff member
Aug 9, 2009
10,750
8,422
A sea of green
Oh, I ran diagnostics. Why do you assume I didn't? :(
For the same reason I would assume anything else: You didn't say you had done so, and you didn't post any results that suggested you did.

We can't read your mind.
We can't see your screen.
All we know about what you did is what you tell us.
All we can see about your code is what you post.


Here is the start of the tree displayed by DOMTreeView:

V #document
........html
......V HTML
..........> HEAD
..........> BODY

Does this show that the document node has two children?
I don't know, I haven't used DOMTreeView. Google doesn't turn up much that's useful in explaining it, either.

If you don't know how to use it, or can't interpret its output, then you should probably invest a little time and write your own diagnostics. It shouldn't be that hard. DOMNode has methods that tell you whether a node has children, and what the first child node is. Having the first child node, you can get the next sibling node, too.

Again, I refer you to this link:
http://www.w3schools.com/htmldom/default.asp
The diagram there clearly shows one child node of the document node. That node (called the root element, with name html) has two child nodes, head and body.

The code I use to get the starting node is:
Code:
[[self mainFrame] loadRequest:[NSURLRequest requestWithURL:[NSURL URLWithString:@"http://en.wikipedia.org/wiki/Main_Page"]]];
node = [[self mainFrame] DOMDocument];
I used Safari to get that page. From Safari's Develop menu I then chose Show Web Inspector. This clearly shows an HTML root element with two children: head and body.

All the evidence I find, plus my experience using JavaScript to access DOM nodes, strongly suggests there is one child node of the document, which is the root HTML element. The root element has two children: head and body. If that doesn't match your expectation, then you might want to confirm your expectation is correct by trying to obtain evidence that supports it. If the evidence suggests a different structure, then you might need to adjust your expectation.
 

Thethuthinang

macrumors member
Original poster
Jan 3, 2011
39
0
All we know about what you did is what you tell us.
All we can see about your code is what you post.

Well, I did explain some of my diagnostics in my original post. I'm not sure why you didn't see it as diagnostics.

I found a solution to the problem. Rather than subclassing WebView, I made my class a delegate for my WebView and used a delegate method that was called when everything was properly loaded. Now it shows that "#document" has two children and is now in agreement with the DOMTreeView example (granted, this may be weird given what http://www.w3schools.com/htmldom/default.asp leads us to expect.)

Thanks for your implicit suggestion that I check that things were loaded properly. I still don't understand why I was previously able to get the document node, but not both of its children.
 

chown33

Moderator
Staff member
Aug 9, 2009
10,750
8,422
A sea of green
Well, I did explain some of my diagnostics in my original post. I'm not sure why you didn't see it as diagnostics.
Because you showed hardly any code, and you didn't show any output at all. All you showed was isolated lines of code, along with a summary description of output. That's not the same thing as posting the actual code that produces diagnostic output, along with the actual output produced.

Thanks for your implicit suggestion that I check that things were loaded properly. I still don't understand why I was previously able to get the document node, but not both of its children.
Without seeing the malfunctioning code in its original execution context, there's no way to know. My first guess would be that you were simply doing the action too soon, before the entire set of nodes was available.

It's often not good enough to say "My expectation is X". One must also say when. As in, "My expectation is X at this point in time," and then identify what point in time that is. This is especially true when dealing with networked applications that deliver data asynchronously or over extended periods of time (i.e. longer than microseconds). If you look for something before it's been delivered, it won't be there. Network speed is always finite.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.