Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

nashyo

macrumors 6502
Original poster
The results appear in duplicate and I'm having trouble figuring out why. I'm new to XPath and text/html. Can someone help please? This is driving me nuts.

(TFHpple and TFHppleElement are third party code classes for parsing html).
Code:
- (IBAction)searchNavBarButtonPressed:(id)sender 
{
    NSData *data = [[NSData alloc] initWithContentsOfURL:[NSURL URLWithString:@"http://www.riglocums.com/doctor-jobs/anaesthetics-153/"]];
    
    TFHpple *xpathParser_ = [[TFHpple alloc] initWithHTMLData:data];
    
    NSArray *elements = [xpathParser_ searchWithXPathQuery:@"//table/tr/td/p[position()=1]"]; //the search results table
        
    NSMutableArray *mutableArray1 = [[NSMutableArray alloc] init];
    if ([elements count] > 0 ) {
        NSString *content;
        for (TFHppleElement *element in elements) {
            TFHppleElement *firstChild = [element firstChild];
            NSArray *nodeChildArray = [firstChild children];
            if ([nodeChildArray count] > 0) {
            content = [[nodeChildArray objectAtIndex:0] content];
            }
            [mutableArray1 addObject:content];
        }
    }
    NSLog(@"%@", mutableArray1);
}

Console
2012-07-03 08:40:03.620 Rig Locums[1164:f803] (
"Consultant Intensive Treatment Unit - 6 months - North England",
"Consultant Intensive Treatment Unit - 6 months - North England",
"Consultant Paediatric Intensivist",
"Consultant Paediatric Intensivist",
"Staff Grade - Anaesthetist - 5 months - North England",
"Staff Grade - Anaesthetist - 5 months - North England",
"SpR Anaesthetics - Midlands - Ongoing",
"SpR Anaesthetics - Midlands - Ongoing",
"Consultant Anaesthetist with an interest in Chronic Pain - South West England - Ongoing",
"Consultant Anaesthetist with an interest in Chronic Pain - South West England - Ongoing",
"Consultant Anaesthetist interested in Cardiothoracic Anaesthesia - 1 month - East England",
"Consultant Anaesthetist interested in Cardiothoracic Anaesthesia - 1 month - East England",
"Consultant Anaesthetist + ITU - North England - 9 months",
"Consultant Anaesthetist + ITU - North England - 9 months",
"Consultant Anaesthetist 2x - North England",
"Consultant Anaesthetist 2x - North England",
"Staff Grade Anaesthetist - 6 month - South England",
"Staff Grade Anaesthetist - 6 month - South England",
"Senior House Officer - Anaesthetics - 3 months - London",
"Senior House Officer - Anaesthetics - 3 months - London"
)
 
Last edited:
Don't know about your libraries, but you should do it this way:

Code:
if ([elements count] > 0 ) {
        NSString *content;
        for (TFHppleElement *element in elements) {
            TFHppleElement *firstChild = [element firstChild];
            NSArray *nodeChildArray = [firstChild children];
            if ([nodeChildArray count] > 0) {
                content = [[nodeChildArray objectAtIndex:0] content];
                if (content && ![mutableArray1 containsObject:content]) {
                     [mutableArray1 addObject:content];
                }
            }
           
        }
    }

This should remove the duplicates.
 
Don't know about your libraries, but you should do it this way:

Code:
if ([elements count] > 0 ) {
        NSString *content;
        for (TFHppleElement *element in elements) {
            TFHppleElement *firstChild = [element firstChild];
            NSArray *nodeChildArray = [firstChild children];
            if ([nodeChildArray count] > 0) {
                content = [[nodeChildArray objectAtIndex:0] content];
                if (content && ![mutableArray1 containsObject:content]) {
                     [mutableArray1 addObject:content];
                }
            }
           
        }
    }

This should remove the duplicates.

Thanks.
This will certainly remove the duplicates, but it is very likely that two identical titles will appear in the search. So I can't rule out that possibility.

----------

TFHppleElement.h
Code:
- (id) initWithNode:(NSDictionary *) theNode;

+ (TFHppleElement *) hppleElementWithNode:(NSDictionary *) theNode;

// Returns this tag's innerHTML content.
@property (nonatomic, copy, readonly) NSString *content;

// Returns the name of the current tag, such as "h3".
@property (nonatomic, copy, readonly) NSString *tagName;

// Returns tag attributes with name as key and content as value.
//   href  = 'http://peepcode.com'
//   class = 'highlight'
@property (nonatomic, strong, readonly) NSDictionary *attributes;

// Returns the children of a given node
@property (nonatomic, strong, readonly) NSArray *children;

// Returns the first child of a given node
@property (nonatomic, strong, readonly) TFHppleElement *firstChild;

// the parent of a node
@property (nonatomic, unsafe_unretained, readonly) TFHppleElement *parent;

// Provides easy access to the content of a specific attribute, 
// such as 'href' or 'class'.
- (NSString *) objectForKey:(NSString *) theKey;
 
Solved it;

'content' needed to be reset. if no content was found, the previous 'content' was being added to the array.

setting content = nil in the loop, fixed my issue.

Code:
- (IBAction)searchNavBarButtonPressed:(id)sender 
{
    NSData *data = [[NSData alloc] initWithContentsOfURL:[NSURL URLWithString:@"http://www.riglocums.com/doctor-jobs/anaesthetics-153/"]];
    
    TFHpple *xpathParser_ = [[TFHpple alloc] initWithHTMLData:data];
    
    NSArray *elements = [xpathParser_ searchWithXPathQuery:@"//table/tr/td/p[position()=1]"];
    
    //NSLog(@"%@", [elements objectAtIndex:0]);
    
    NSMutableArray *mutableArray1 = [[NSMutableArray alloc] init];
    if ([elements count] > 0 ) {
        NSString *content;
        for (TFHppleElement *element in elements) {
            TFHppleElement *firstChild = [element firstChild];
            //NSLog(@"FirstChild: %@", firstChild);
            NSArray *nodeChildArray = [firstChild children];
            if ([nodeChildArray count] > 0) {
            content = [[nodeChildArray objectAtIndex:0] content];
            }
            if (content) {
            [mutableArray1 addObject:content];
                content = nil;
            }
        }
    }
      
    NSLog(@"%@", mutableArray1);
}
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.