Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

ArtOfWarfare

macrumors G3
Original poster
Nov 26, 2007
9,558
6,058
I've written a method which can extract all the img tags from an HTML file, get the width and height attributes when they're given and calculate the area from that.

Unfortunately, I don't control the HTML files and sometimes I end up with img tags without the width and/or height attributes. In those cases, the dimensions will simply be the dimensions of the image file. How should I determine the dimensions of the image? I'd like this method to:
1 - Consume as little data as possible for each image, since it may be running on a cellular network with multiple requests per minute for hours on end.
2 - Be Lightweight (it runs in the background)

Is the best method really going to be download the image, load it into an UIImage, and get the dimensions from that, or is there some shortcut I can take where I only send something like a HEAD request?
 

xArtx

macrumors 6502a
Mar 30, 2012
764
1
I don't know if it can happen on this platform,
but the image on the page had to be downloaded by your browser anyway.
First thought is download the images, and retrieve their dimensions from the copy in RAM.
Then modify the source link to retrieve the image locally,
instead of from the network again.
 

ArtOfWarfare

macrumors G3
Original poster
Nov 26, 2007
9,558
6,058
I don't know if it can happen on this platform,
but the image on the page had to be downloaded by your browser anyway.
First thought is download the images, and retrieve their dimensions from the copy in RAM.
Then modify the source link to retrieve the image locally,
instead of from the network again.

Unless I'm mistaken, no, the image hasn't already been downloaded. I used NSString's stringWithContentOfURL: (might not be the exact name) to download the main HTML file. So all it contains is the <img src="[...]"> tag - it hasn't had to download the image yet.

I guess I've probably already done all the optimization I can do by scanning for the width and height tags... I'll just have to download the image to get its size in other cases.
 

xArtx

macrumors 6502a
Mar 30, 2012
764
1
Unless I'm mistaken, no, the image hasn't already been downloaded. I used NSString's stringWithContentOfURL: (might not be the exact name) to download the main HTML file. So all it contains is the <img src="[...]"> tag - it hasn't had to download the image yet.

I guess I've probably already done all the optimization I can do by scanning for the width and height tags... I'll just have to download the image to get its size in other cases.

Again, I don't know if this platform will do it,
but what I meant is scanning that html file initially,
and downloading all of this images from the source tags into RAM,
whether the size is in the tags or not.
Then display them from RAM when it comes time to parse the HTML.

The browser does have to download them a some time or another, or they would never be displayed.
I don't see how it would hurt to download them all at the beginning,
the get the dimensions from the source tags if you can.
 

ArtOfWarfare

macrumors G3
Original poster
Nov 26, 2007
9,558
6,058
Again, I don't know if this platform will do it,
but what I meant is scanning that html file initially,
and downloading all of this images from the source tags into RAM,
whether the size is in the tags or not.
Then display them from RAM when it comes time to parse the HTML.

The browser does have to download them a some time or another, or they would never be displayed.
I don't see how it would hurt to download them all at the beginning,
the get the dimensions from the source tags if you can.

Ah, that's where you're mistaken. This isn't a browser and it never displays the images. This is more of a scraping application that attempts to determine what the most important parts of a webpage are. To determine what the important images of a page are, it just compares their sizes.
 

xArtx

macrumors 6502a
Mar 30, 2012
764
1
Ah, that's where you're mistaken. This isn't a browser and it never displays the images. This is more of a scraping application that attempts to determine what the most important parts of a webpage are. To determine what the important images of a page are, it just compares their sizes.

Ok, I understand. Very clever :)

Could you do what I described with iOS if you were just looking to display a page?

EDIT,, again, just an idea,
but can you look at the file size in bytes before you actually download it?
If you get the size in bytes and the file extension, you could estimate the significance of the image that way
in the case the dimensions were missing.
 

ArtOfWarfare

macrumors G3
Original poster
Nov 26, 2007
9,558
6,058
Ok, I understand. Very clever :)

Could you do what I described with iOS if you were just looking to display a page?

EDIT,, again, just an idea,
but can you look at the file size in bytes before you actually download it?
If you get the size in bytes and the file extension, you could estimate the significance of the image that way
in the case the dimensions were missing.

If that's doable, it sounds like a great idea.

I haven't done much web programming, but a quick Google says that if I send a HEAD request, I'll get a response including CONTENT_LENGTH, which will tell me the file size. I'll just have to find how to send a head request...

Further Googling says I need to use NSURLConnection coupled with NSURLRequest... This blog talks about how to set up NSURLRequest to only get the HEAD:

http://sutes.co.uk/2009/12/nsurlconnection-using-head-met.html

If HEAD doesn't work I may just fall back on downloading the image.
 
Last edited by a moderator:

PhoneyDeveloper

macrumors 68040
Sep 2, 2008
3,114
93
Most efficient way to get the dimensions of an image is to use ImageIO and query for the dimensions. This will most likely read the EXIF info that includes the dimensions and won't decode the image data. For modest size images this may not make a lot of difference but should be faster than creating a UIImage.
 

ArtOfWarfare

macrumors G3
Original poster
Nov 26, 2007
9,558
6,058
Most efficient way to get the dimensions of an image is to use ImageIO and query for the dimensions. This will most likely read the EXIF info that includes the dimensions and won't decode the image data. For modest size images this may not make a lot of difference but should be faster than creating a UIImage.

I'm looking to get the size of the image without downloading it, which I would assume your method requires.

I'm working on figuring out how viable xArtx's idea is. I've been using the following in bash on a variety of image URLs and checking out the Content-Length in the response:

Code:
curl -I <image URL here>

This does the same thing as send a HEAD request and prints the response. It appears much quicker and lightweight than downloading the entire image.
 

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
This does the same thing as send a HEAD request and prints the response. It appears much quicker and lightweight than downloading the entire image.

Yes, but how can you get image dimensions from image file size in a reliable way. It seems you would be measuring how hard the image is compressed, channels, bit depth and so on.

Edit: Come to think of it, even if you know it's uncompressed what is the dimentions of a 32Kb image (excluding meta data)? One possibility is 8000 x 1, if each pixel is 32 bit with one alpha channel, but there are many other possible dimensions for that file size.
 
Last edited:

xArtx

macrumors 6502a
Mar 30, 2012
764
1
Yes, but how can you get image dimensions from image file size in a reliable way. It seems you would be measuring how hard the image is compressed, channels, bit depth and so on.

Edit: Come to think of it, even if you know it's uncompressed what is the dimentions of a 32Kb image (excluding meta data)? One possibility is 8000 x 1, if each pixel is 32 bit with one alpha channel, but there are many other possible dimensions for that file size.

I think the file size would be better as long as you now the extension too.
A site may have a banner of huge dimensions, but the detail was in an image
of smaller dimensions, which would be qualified as more important.
 

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
I think the file size would be better as long as you now the extension too.

I don't get why it would be better than having the actual dimensions, but regardless of that, a file extensions will not tell you about how hard an image is compressed or informations about alpha channel or bit depth, or the dimension.

A site may have a banner of huge dimensions, but the detail was in an image
of smaller dimensions, which would be qualified as more important.

Maybe, required that you know what that dimensions are of course, which takes us back to the original problem.
 

ArtOfWarfare

macrumors G3
Original poster
Nov 26, 2007
9,558
6,058
For now, I'm just using area.

I was worried about compression being a major factor in size, but looking around the web, images I'm not interested in tend to be under 30 K while images I'm interested in tend to be over 40 K.
 

subsonix

macrumors 68040
Feb 2, 2008
3,551
79
For now, I'm just using area.

I was worried about compression being a major factor in size, but looking around the web, images I'm not interested in tend to be under 30 K while images I'm interested in tend to be over 40 K.

Yeah, maybe more important images are less compressed as well, that makes sense, at least if people make rational decisions about compressing images going on the web. You should probably just try it and see how accurate it is, should be interesting.
 

xArtx

macrumors 6502a
Mar 30, 2012
764
1
Even if you downloaded every image and scanned a horiz and vert row to
count colours, you could still make a mistake because a simple image
could still be something that was posted on a news site that day.
It sounds like one of those complicated problems for a computer,
and while we're talking about it, OP is collecting the data that can qualify any method over another.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.