Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

nashyo

macrumors 6502
Original poster
Oct 1, 2010
299
0
Bristol
The results appear in duplicate and I'm having trouble figuring out why. I'm new to XPath and text/html. Can someone help please? This is driving me nuts.

(TFHpple and TFHppleElement are third party code classes for parsing html).
Code:
- (IBAction)searchNavBarButtonPressed:(id)sender 
{
    NSData *data = [[NSData alloc] initWithContentsOfURL:[NSURL URLWithString:@"http://www.riglocums.com/doctor-jobs/anaesthetics-153/"]];
    
    TFHpple *xpathParser_ = [[TFHpple alloc] initWithHTMLData:data];
    
    NSArray *elements = [xpathParser_ searchWithXPathQuery:@"//table/tr/td/p[position()=1]"]; //the search results table
        
    NSMutableArray *mutableArray1 = [[NSMutableArray alloc] init];
    if ([elements count] > 0 ) {
        NSString *content;
        for (TFHppleElement *element in elements) {
            TFHppleElement *firstChild = [element firstChild];
            NSArray *nodeChildArray = [firstChild children];
            if ([nodeChildArray count] > 0) {
            content = [[nodeChildArray objectAtIndex:0] content];
            }
            [mutableArray1 addObject:content];
        }
    }
    NSLog(@"%@", mutableArray1);
}

Console
2012-07-03 08:40:03.620 Rig Locums[1164:f803] (
"Consultant Intensive Treatment Unit - 6 months - North England",
"Consultant Intensive Treatment Unit - 6 months - North England",
"Consultant Paediatric Intensivist",
"Consultant Paediatric Intensivist",
"Staff Grade - Anaesthetist - 5 months - North England",
"Staff Grade - Anaesthetist - 5 months - North England",
"SpR Anaesthetics - Midlands - Ongoing",
"SpR Anaesthetics - Midlands - Ongoing",
"Consultant Anaesthetist with an interest in Chronic Pain - South West England - Ongoing",
"Consultant Anaesthetist with an interest in Chronic Pain - South West England - Ongoing",
"Consultant Anaesthetist interested in Cardiothoracic Anaesthesia - 1 month - East England",
"Consultant Anaesthetist interested in Cardiothoracic Anaesthesia - 1 month - East England",
"Consultant Anaesthetist + ITU - North England - 9 months",
"Consultant Anaesthetist + ITU - North England - 9 months",
"Consultant Anaesthetist 2x - North England",
"Consultant Anaesthetist 2x - North England",
"Staff Grade Anaesthetist - 6 month - South England",
"Staff Grade Anaesthetist - 6 month - South England",
"Senior House Officer - Anaesthetics - 3 months - London",
"Senior House Officer - Anaesthetics - 3 months - London"
)
 
Last edited:

CodeBreaker

macrumors 6502
Nov 5, 2010
494
1
Sea of Tranquility
Don't know about your libraries, but you should do it this way:

Code:
if ([elements count] > 0 ) {
        NSString *content;
        for (TFHppleElement *element in elements) {
            TFHppleElement *firstChild = [element firstChild];
            NSArray *nodeChildArray = [firstChild children];
            if ([nodeChildArray count] > 0) {
                content = [[nodeChildArray objectAtIndex:0] content];
                if (content && ![mutableArray1 containsObject:content]) {
                     [mutableArray1 addObject:content];
                }
            }
           
        }
    }

This should remove the duplicates.
 

nashyo

macrumors 6502
Original poster
Oct 1, 2010
299
0
Bristol
Don't know about your libraries, but you should do it this way:

Code:
if ([elements count] > 0 ) {
        NSString *content;
        for (TFHppleElement *element in elements) {
            TFHppleElement *firstChild = [element firstChild];
            NSArray *nodeChildArray = [firstChild children];
            if ([nodeChildArray count] > 0) {
                content = [[nodeChildArray objectAtIndex:0] content];
                if (content && ![mutableArray1 containsObject:content]) {
                     [mutableArray1 addObject:content];
                }
            }
           
        }
    }

This should remove the duplicates.

Thanks.
This will certainly remove the duplicates, but it is very likely that two identical titles will appear in the search. So I can't rule out that possibility.

----------

TFHppleElement.h
Code:
- (id) initWithNode:(NSDictionary *) theNode;

+ (TFHppleElement *) hppleElementWithNode:(NSDictionary *) theNode;

// Returns this tag's innerHTML content.
@property (nonatomic, copy, readonly) NSString *content;

// Returns the name of the current tag, such as "h3".
@property (nonatomic, copy, readonly) NSString *tagName;

// Returns tag attributes with name as key and content as value.
//   href  = 'http://peepcode.com'
//   class = 'highlight'
@property (nonatomic, strong, readonly) NSDictionary *attributes;

// Returns the children of a given node
@property (nonatomic, strong, readonly) NSArray *children;

// Returns the first child of a given node
@property (nonatomic, strong, readonly) TFHppleElement *firstChild;

// the parent of a node
@property (nonatomic, unsafe_unretained, readonly) TFHppleElement *parent;

// Provides easy access to the content of a specific attribute, 
// such as 'href' or 'class'.
- (NSString *) objectForKey:(NSString *) theKey;
 

nashyo

macrumors 6502
Original poster
Oct 1, 2010
299
0
Bristol
Solved it;

'content' needed to be reset. if no content was found, the previous 'content' was being added to the array.

setting content = nil in the loop, fixed my issue.

Code:
- (IBAction)searchNavBarButtonPressed:(id)sender 
{
    NSData *data = [[NSData alloc] initWithContentsOfURL:[NSURL URLWithString:@"http://www.riglocums.com/doctor-jobs/anaesthetics-153/"]];
    
    TFHpple *xpathParser_ = [[TFHpple alloc] initWithHTMLData:data];
    
    NSArray *elements = [xpathParser_ searchWithXPathQuery:@"//table/tr/td/p[position()=1]"];
    
    //NSLog(@"%@", [elements objectAtIndex:0]);
    
    NSMutableArray *mutableArray1 = [[NSMutableArray alloc] init];
    if ([elements count] > 0 ) {
        NSString *content;
        for (TFHppleElement *element in elements) {
            TFHppleElement *firstChild = [element firstChild];
            //NSLog(@"FirstChild: %@", firstChild);
            NSArray *nodeChildArray = [firstChild children];
            if ([nodeChildArray count] > 0) {
            content = [[nodeChildArray objectAtIndex:0] content];
            }
            if (content) {
            [mutableArray1 addObject:content];
                content = nil;
            }
        }
    }
      
    NSLog(@"%@", mutableArray1);
}
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.