iOS Which would be faster

KnightWRX · Dec 13, 2010

With the end of the year coming, it's nice to take a break from questions that matter and just spend a few minutes over-optimizing a solution.

Setup :

I have some RGBA pixel data stored in an array of unsigned chars (basically, 8 bits per array member, so 1/4th a pixel). Now, my palette colors are set as 32 bit Hex values, and I want to assign them to my array. I have come up with 2 methods of extracting the different components, a BIT AND/SHIFT and a BIT SHIFT/CAST method :

Code:

-(void) setRGBAPixelAtX: (NSInteger) x y: (NSInteger) y color: (uint32_t) pixel
{
	NSInteger index = (width * bytesPerChannel * (y - 1)) + ((x - 1) * bytesPerChannel);
       
/* The SHIFT/CAST method */

        imagePixels[index] = (unsigned char) (pixel >> 24),
	imagePixels[index+1] = (unsigned char) (pixel >> 16),
	imagePixels[index+2] = (unsigned char) (pixel >> 8),
	imagePixels[index+3] = (unsigned char) pixel;

/* OR the AND/SHIFT method */

	imagePixels[index] = (pixel & 0xff000000) >> 24,
	imagePixels[index+1] = (pixel & 0x00ff0000) >> 16,
	imagePixels[index+2] = (pixel & 0x0000ff00) >> 8,
	imagePixels[index+3] = pixel & 0x000000ff;
}

I'm sure there's about 3 dozen other ways of doing this, some that might even be faster than using bit operations. Anyway, how would you guys have done this ? What would be faster on iOS's ARM architecture ? Does it even matter (of course not) ?

ianray · Dec 13, 2010

The 'AND/SHIFT' method will typically be less efficient, in un-optimized code, because the compiler has to generate code to do the masking.

I split the code into two alternative functions, built it in a new iOS 4.2 iPhone View-Based-App, and chose Build -> Show Assembly Code. In Debug mode, I saw what I expected (as above). In Release mode, the generated code is unsurprisingly identical.

The single best optimization would be to store 32-bit data directly in to imagePixels, but this would obviously require same endian-ness and either a uint32_t array or some casting... 😀

(And, as you note, there are plenty of other things to try -- like inlining...)

KnightWRX · Dec 13, 2010

ianray said:
The single best optimization would be to store 32-bit data directly in to imagePixels, but this would obviously require same endian-ness and either a uint32_t array or some casting... 😀

For endianess, correct me if I'm wrong, but the way I obtain the pixel data in the first place is using Quartz and writing my bunch of pixels (which can be either an image or a palette) from png format to a Quartz context. As such, I believe kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big actually assure me that I always have RGBA represented in memory as 0xRRGGBBAA, hence how I can manipulate my pixel data through unsigned chars for direct sub-pixel access.

I think you're right though, a 32 bit type would be best to store a single pixel, but I wouldn't want to lose the 8 bit type for component access.

I think I'll try rewriting my class using a union of a 4 member unsigned char array and a uint32_t.

KnightWRX · Dec 14, 2010

KnightWRX said:
For endianess, correct me if I'm wrong, but the way I obtain the pixel data in the first place is using Quartz and writing my bunch of pixels (which can be either an image or a palette) from png format to a Quartz context. As such, I believe kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big actually assure me that I always have RGBA represented in memory as 0xRRGGBBAA, hence how I can manipulate my pixel data through unsigned chars for direct sub-pixel access.

Update on the Endian thing, you were right, there is an issue. I rewrote the whole thing with a union, adding the following definitions :

Code:

#define COMPONENT_RED		   0x00
#define COMPONENT_GREEN	       0x01
#define COMPONENT_BLUE         0x02
#define COMPONENT_ALPHA        0x03

typedef union
{
	unsigned char components[4];
	uint32_t pixel;
} PixelData;

I defined imagePixels to be of type (PixelData *) filled it up with an image without even modifying any code too, quite the easy surgery. I then rewrote my XY functions to not have to do a bytesPerChannel jump and let it rip. Worked like a charm, except for this little quirk :

Code:

NSLog(@"Pixel at %dx%d Hex RGBA : %x - R: %x G: %x B: %x A:%x",
				  x,
				  y,
				  imagePixels[index].pixel,
				  imagePixels[index].components[COMPONENT_RED],
			           imagePixels[index].components[COMPONENT_GREEN],
				  imagePixels[index].components[COMPONENT_BLUE],
				  imagePixels[index].components[COMPONENT_ALPHA]);

The in-memory browser and the component access through the unsigned char array is as I thought, Quartz did a fine job making it 0xRRGGBBAA in memory. However, if I use my uint32_t member to access and set the value, the value is reversed. The NSLog code above results in the following output :

Code:

2010-12-14 09:09:34.656 App[8973:207] Pixel at 1x1 Hex RGBA : ff555555 - R: 55 G: 55 B: 55 A:ff

So if I want to manipulate the 32 bit value, I need to go at it as a ABGR pixel format. Bummer, that's a whole bunch of #defines to rewrite.

ianray · Dec 14, 2010

KnightWRX said:
However, if I use my uint32_t member to access and set the value, the value is reversed.

On a little-endian machine, yes.

Is it possible to use native endian-ness? That will get you get the best performance (though with potential cost if network/disk protocols use a different endian-ness).

Search

Search

iOS Which would be faster

KnightWRX

macrumors Pentium

ianray

macrumors 6502

KnightWRX

macrumors Pentium

KnightWRX

macrumors Pentium

ianray

macrumors 6502

Our Staff