searching with xpath queries

Discussion in 'iOS Programming' started by nashyo, Jul 2, 2012.

  1. nashyo, Jul 2, 2012
    Last edited: Jul 3, 2012

    nashyo macrumors 6502

    nashyo

    Joined:
    Oct 1, 2010
    Location:
    Bristol
    #1
    The results appear in duplicate and I'm having trouble figuring out why. I'm new to XPath and text/html. Can someone help please? This is driving me nuts.

    (TFHpple and TFHppleElement are third party code classes for parsing html).
    Code:
    - (IBAction)searchNavBarButtonPressed:(id)sender 
    {
        NSData *data = [[NSData alloc] initWithContentsOfURL:[NSURL URLWithString:@"http://www.riglocums.com/doctor-jobs/anaesthetics-153/"]];
        
        TFHpple *xpathParser_ = [[TFHpple alloc] initWithHTMLData:data];
        
        NSArray *elements = [xpathParser_ searchWithXPathQuery:@"//table/tr/td/p[position()=1]"]; //the search results table
            
        NSMutableArray *mutableArray1 = [[NSMutableArray alloc] init];
        if ([elements count] > 0 ) {
            NSString *content;
            for (TFHppleElement *element in elements) {
                TFHppleElement *firstChild = [element firstChild];
                NSArray *nodeChildArray = [firstChild children];
                if ([nodeChildArray count] > 0) {
                content = [[nodeChildArray objectAtIndex:0] content];
                }
                [mutableArray1 addObject:content];
            }
        }
        NSLog(@"%@", mutableArray1);
    }
    Console
    2012-07-03 08:40:03.620 Rig Locums[1164:f803] (
    "Consultant Intensive Treatment Unit - 6 months - North England",
    "Consultant Intensive Treatment Unit - 6 months - North England",
    "Consultant Paediatric Intensivist",
    "Consultant Paediatric Intensivist",
    "Staff Grade - Anaesthetist - 5 months - North England",
    "Staff Grade - Anaesthetist - 5 months - North England",
    "SpR Anaesthetics - Midlands - Ongoing",
    "SpR Anaesthetics - Midlands - Ongoing",
    "Consultant Anaesthetist with an interest in Chronic Pain - South West England - Ongoing",
    "Consultant Anaesthetist with an interest in Chronic Pain - South West England - Ongoing",
    "Consultant Anaesthetist interested in Cardiothoracic Anaesthesia - 1 month - East England",
    "Consultant Anaesthetist interested in Cardiothoracic Anaesthesia - 1 month - East England",
    "Consultant Anaesthetist + ITU - North England - 9 months",
    "Consultant Anaesthetist + ITU - North England - 9 months",
    "Consultant Anaesthetist 2x - North England",
    "Consultant Anaesthetist 2x - North England",
    "Staff Grade Anaesthetist - 6 month - South England",
    "Staff Grade Anaesthetist - 6 month - South England",
    "Senior House Officer - Anaesthetics - 3 months - London",
    "Senior House Officer - Anaesthetics - 3 months - London"
    )
     
  2. CodeBreaker macrumors 6502

    Joined:
    Nov 5, 2010
    Location:
    Sea of Tranquility
    #2
    Don't know about your libraries, but you should do it this way:

    Code:
    if ([elements count] > 0 ) {
            NSString *content;
            for (TFHppleElement *element in elements) {
                TFHppleElement *firstChild = [element firstChild];
                NSArray *nodeChildArray = [firstChild children];
                if ([nodeChildArray count] > 0) {
                    content = [[nodeChildArray objectAtIndex:0] content];
                    if (content && ![mutableArray1 containsObject:content]) {
                         [mutableArray1 addObject:content];
                    }
                }
               
            }
        }
     
    This should remove the duplicates.
     
  3. nashyo thread starter macrumors 6502

    nashyo

    Joined:
    Oct 1, 2010
    Location:
    Bristol
    #3
    Thanks.
    This will certainly remove the duplicates, but it is very likely that two identical titles will appear in the search. So I can't rule out that possibility.

    ----------

    TFHppleElement.h
    Code:
    - (id) initWithNode:(NSDictionary *) theNode;
    
    + (TFHppleElement *) hppleElementWithNode:(NSDictionary *) theNode;
    
    // Returns this tag's innerHTML content.
    @property (nonatomic, copy, readonly) NSString *content;
    
    // Returns the name of the current tag, such as "h3".
    @property (nonatomic, copy, readonly) NSString *tagName;
    
    // Returns tag attributes with name as key and content as value.
    //   href  = 'http://peepcode.com'
    //   class = 'highlight'
    @property (nonatomic, strong, readonly) NSDictionary *attributes;
    
    // Returns the children of a given node
    @property (nonatomic, strong, readonly) NSArray *children;
    
    // Returns the first child of a given node
    @property (nonatomic, strong, readonly) TFHppleElement *firstChild;
    
    // the parent of a node
    @property (nonatomic, unsafe_unretained, readonly) TFHppleElement *parent;
    
    // Provides easy access to the content of a specific attribute, 
    // such as 'href' or 'class'.
    - (NSString *) objectForKey:(NSString *) theKey;
     
  4. nashyo thread starter macrumors 6502

    nashyo

    Joined:
    Oct 1, 2010
    Location:
    Bristol
    #4
    Solved it;

    'content' needed to be reset. if no content was found, the previous 'content' was being added to the array.

    setting content = nil in the loop, fixed my issue.

    Code:
    - (IBAction)searchNavBarButtonPressed:(id)sender 
    {
        NSData *data = [[NSData alloc] initWithContentsOfURL:[NSURL URLWithString:@"http://www.riglocums.com/doctor-jobs/anaesthetics-153/"]];
        
        TFHpple *xpathParser_ = [[TFHpple alloc] initWithHTMLData:data];
        
        NSArray *elements = [xpathParser_ searchWithXPathQuery:@"//table/tr/td/p[position()=1]"];
        
        //NSLog(@"%@", [elements objectAtIndex:0]);
        
        NSMutableArray *mutableArray1 = [[NSMutableArray alloc] init];
        if ([elements count] > 0 ) {
            NSString *content;
            for (TFHppleElement *element in elements) {
                TFHppleElement *firstChild = [element firstChild];
                //NSLog(@"FirstChild: %@", firstChild);
                NSArray *nodeChildArray = [firstChild children];
                if ([nodeChildArray count] > 0) {
                content = [[nodeChildArray objectAtIndex:0] content];
                }
                if (content) {
                [mutableArray1 addObject:content];
                    content = nil;
                }
            }
        }
          
        NSLog(@"%@", mutableArray1);
    }
     

Share This Page