Parse <img from html-string

Discussion in 'iOS Programming' started by DennisBlah, Jul 8, 2014.

  1. DennisBlah, Jul 8, 2014
    Last edited: Jul 9, 2014

    DennisBlah macrumors 6502

    DennisBlah

    Joined:
    Dec 5, 2013
    Location:
    The Netherlands
    #1
    Hey all,

    I'm trying to store a few pages locally so I can still load them up in my webview when I don't have a network connection.

    Now the saving goes well, only problem I'm facing is with images.
    I'm trying to get the content between <img src=\" and \"
    So I can store this image locally as well and rewrite the code a little bit.

    I can find all the locations of each <img src=\"
    and ofcourse I can find all locations of each \"
    and see which \" is most nearest after the location of <img src=\"

    Get the length between and then substract the string from there so I get the substring to download the image and store it and rewrite it to where I'm storing it locally.

    However I'm sure this can be done a lot easier:

    Here is my code for now: (not fully done yet)
    Code:
        NSMutableArray *startRanges = [[NSMutableArray alloc] initWithObjects: nil];
        NSMutableArray *endRanges = [[NSMutableArray alloc] initWithObjects: nil];
        NSUInteger length = [pageHTMLstring length];
        NSRange range = NSMakeRange(0, length);
        while(range.location != NSNotFound)
        {
            range = [pageHTMLstring rangeOfString: @"<img src=\"" options:0 range:range];
            if(range.location != NSNotFound)
            {
                range = NSMakeRange(range.location + range.length, length - (range.location + range.length));
                [startRanges addObject: [NSString stringWithFormat: @"%i", range.location]];
            }
        }
        
        
        range = NSMakeRange(0, length);
        while(range.location != NSNotFound)
        {
            range = [pageHTMLstring rangeOfString: @"\"" options:0 range:range];
            if(range.location != NSNotFound)
            {
                range = NSMakeRange(range.location + range.length, length - (range.location + range.length));
                [endRanges addObject: [NSString stringWithFormat: @"%i", range.location]];
            }
        }
        
        for(int a=0; a<[startRanges count]; a++) {
            int end = 0;
            for(int b=0; b<[endRanges count]; b++) {
                if([[endRanges objectAtIndex: b] intValue] > [[startRanges objectAtIndex: a] intValue]) {
                    end = b;
                    break;
                }
            }
            
            double start = [[startRanges objectAtIndex: a] intValue];
            double stop = [[endRanges objectAtIndex: end] intValue];
            NSString *test = [pageHTMLstring substringWithRange:NSMakeRange(start, (stop - start))];
            NSLog(@"The image: %@",test);
        }
    
    
     
  2. DennisBlah, Jul 8, 2014
    Last edited: Jul 8, 2014

    DennisBlah thread starter macrumors 6502

    DennisBlah

    Joined:
    Dec 5, 2013
    Location:
    The Netherlands
    #2
    I found the following code which is great to replace things between < > tags:
    Code:
    NSRange r;
    NSString *s = @"my<remove>output";
    while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
        s = [s stringByReplacingCharactersInRange:r withString:@""];
    
    So I tried by altering <[^>]+> to <img[^>]+> and put a log of
    [s substringWithRange: r]

    Which is doing great! Exactly what I'm looking for. However it will never get out of the loop.
    (I could do a replace with unique chars like: §§img1§§ with a counter on place of 1)
    After that replace these chars with the new image but still..
    Is there anyone that can push me in the right direction? This is a lot cleaner and faster.

    Because now I got a piece of code of around 65 lines, which does load nsdata from url, change nsdata to string do the search and replace,
    Download images to documents in a folder and replace the <img src= to the correct new name for the image and location, and then save it.

    The result is satisfying however I know this can be done lot better...
     
  3. DennisBlah thread starter macrumors 6502

    DennisBlah

    Joined:
    Dec 5, 2013
    Location:
    The Netherlands
    #3
    I have changed the concept to filter out any .jpg, .png and .gif, either if its quoted with a " or '.

    Code:
     NSMutableArray *theImages = [[NSMutableArray alloc] initWithObjects: nil];
        
        NSMutableArray *imgRanges = [[NSMutableArray alloc] initWithObjects: nil];
        NSUInteger length = [theHTML length];
        NSRange range = NSMakeRange(0, length);
        //.jpg
        while(range.location != NSNotFound)
        {
            range = [theHTML rangeOfString: @".jpg" options:0 range:range];
            if(range.location != NSNotFound)
            {
                range = NSMakeRange(range.location + range.length, length - (range.location + range.length));
                [imgRanges addObject: [NSString stringWithFormat: @"%i", range.location]];
            }
        }
        //.png
        range = NSMakeRange(0, length);
        while(range.location != NSNotFound)
        {
            range = [theHTML rangeOfString: @".png" options:0 range:range];
            if(range.location != NSNotFound)
            {
                range = NSMakeRange(range.location + range.length, length - (range.location + range.length));
                [imgRanges addObject: [NSString stringWithFormat: @"%i", range.location]];
            }
        }
        //.gif
        range = NSMakeRange(0, length);
        while(range.location != NSNotFound)
        {
            range = [theHTML rangeOfString: @".gif" options:0 range:range];
            if(range.location != NSNotFound)
            {
                range = NSMakeRange(range.location + range.length, length - (range.location + range.length));
                [imgRanges addObject: [NSString stringWithFormat: @"%i", range.location]];
            }
        }
        
        NSMutableArray *endChars = [[NSMutableArray alloc] initWithObjects: nil];
        NSMutableArray *endCharLocs = [[NSMutableArray alloc] initWithObjects: nil];
        for(int a=0; a<[imgRanges count]; a++) {
            //The character
            [endChars addObject: [theHTML substringWithRange:NSMakeRange([[imgRanges objectAtIndex: a] intValue], ([[imgRanges objectAtIndex: a] intValue] + 1) - [[imgRanges objectAtIndex: a]intValue])]];
            //The location of character
            [endCharLocs addObject: [imgRanges objectAtIndex: a]];
        }
        
        NSMutableArray *startCharLocs = [[NSMutableArray alloc] initWithObjects: nil];
        NSMutableArray *tempCharLocs = [[NSMutableArray alloc] initWithObjects: nil];
        for(int a=0; a<[imgRanges count]; a++) {
            //Checking for this character
            NSString *checkChar = [endChars objectAtIndex: a];
            //Temporary array for locations
            tempCharLocs = [[NSMutableArray alloc] initWithObjects: nil];
            
            range = NSMakeRange(0, length);
            BOOL stopLoop = NO;
            while(range.location != NSNotFound && !stopLoop) {
                range = [theHTML rangeOfString: checkChar options:0 range:range];
                if(range.location != NSNotFound) {
                    range = NSMakeRange(range.location + range.length, length - (range.location + range.length));
                    [tempCharLocs addObject: [NSString stringWithFormat: @"%i", range.location]];
                    if(range.location > endCharLocs) {
                        stopLoop = YES;
                    }
                }
            }
            [startCharLocs addObject: [tempCharLocs objectAtIndex: [tempCharLocs count] - 2]];
        }
        
        //Add the full images
        for(int a=0; a<[imgRanges count]; a++) {
            [theImages addObject: [theHTML substringWithRange:NSMakeRange([[startCharLocs objectAtIndex: a] intValue], [[endCharLocs objectAtIndex: a] intValue] - [[startCharLocs objectAtIndex: a]intValue])]];
        }
    
    This is just a part of my code.
    After this I do a for loop through theImages array, and throwing a stringByReplacingOccurrencesOfString: over theHTML string.

    Is there anyone that can finetune this? Because I think this can be done lot better.

    Thanks!
     

Share This Page