Which would be faster

Discussion in 'iOS Programming' started by KnightWRX, Dec 13, 2010.

  1. KnightWRX macrumors Pentium

    KnightWRX

    Joined:
    Jan 28, 2009
    Location:
    Quebec, Canada
    #1
    With the end of the year coming, it's nice to take a break from questions that matter and just spend a few minutes over-optimizing a solution.

    Setup :

    I have some RGBA pixel data stored in an array of unsigned chars (basically, 8 bits per array member, so 1/4th a pixel). Now, my palette colors are set as 32 bit Hex values, and I want to assign them to my array. I have come up with 2 methods of extracting the different components, a BIT AND/SHIFT and a BIT SHIFT/CAST method :

    Code:
    -(void) setRGBAPixelAtX: (NSInteger) x y: (NSInteger) y color: (uint32_t) pixel
    {
    	NSInteger index = (width * bytesPerChannel * (y - 1)) + ((x - 1) * bytesPerChannel);
           
    /* The SHIFT/CAST method */
    
            imagePixels[index] = (unsigned char) (pixel >> 24),
    	imagePixels[index+1] = (unsigned char) (pixel >> 16),
    	imagePixels[index+2] = (unsigned char) (pixel >> 8),
    	imagePixels[index+3] = (unsigned char) pixel;
    
    /* OR the AND/SHIFT method */
    
    	imagePixels[index] = (pixel & 0xff000000) >> 24,
    	imagePixels[index+1] = (pixel & 0x00ff0000) >> 16,
    	imagePixels[index+2] = (pixel & 0x0000ff00) >> 8,
    	imagePixels[index+3] = pixel & 0x000000ff;
    }
    I'm sure there's about 3 dozen other ways of doing this, some that might even be faster than using bit operations. Anyway, how would you guys have done this ? What would be faster on iOS's ARM architecture ? Does it even matter (of course not) ?
     
  2. ianray macrumors 6502

    Joined:
    Jun 22, 2010
    Location:
    @
    #2
    The 'AND/SHIFT' method will typically be less efficient, in un-optimized code, because the compiler has to generate code to do the masking.

    I split the code into two alternative functions, built it in a new iOS 4.2 iPhone View-Based-App, and chose Build -> Show Assembly Code. In Debug mode, I saw what I expected (as above). In Release mode, the generated code is unsurprisingly identical.

    The single best optimization would be to store 32-bit data directly in to imagePixels, but this would obviously require same endian-ness and either a uint32_t array or some casting... :D

    (And, as you note, there are plenty of other things to try -- like inlining...)
     
  3. KnightWRX thread starter macrumors Pentium

    KnightWRX

    Joined:
    Jan 28, 2009
    Location:
    Quebec, Canada
    #3
    For endianess, correct me if I'm wrong, but the way I obtain the pixel data in the first place is using Quartz and writing my bunch of pixels (which can be either an image or a palette) from png format to a Quartz context. As such, I believe kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big actually assure me that I always have RGBA represented in memory as 0xRRGGBBAA, hence how I can manipulate my pixel data through unsigned chars for direct sub-pixel access.

    I think you're right though, a 32 bit type would be best to store a single pixel, but I wouldn't want to lose the 8 bit type for component access.

    I think I'll try rewriting my class using a union of a 4 member unsigned char array and a uint32_t.
     
  4. KnightWRX thread starter macrumors Pentium

    KnightWRX

    Joined:
    Jan 28, 2009
    Location:
    Quebec, Canada
    #4
    Update on the Endian thing, you were right, there is an issue. I rewrote the whole thing with a union, adding the following definitions :

    Code:
    
    #define COMPONENT_RED		   0x00
    #define COMPONENT_GREEN	       0x01
    #define COMPONENT_BLUE         0x02
    #define COMPONENT_ALPHA        0x03
    
    typedef union
    {
    	unsigned char components[4];
    	uint32_t pixel;
    } PixelData;
    I defined imagePixels to be of type (PixelData *) filled it up with an image without even modifying any code too, quite the easy surgery. I then rewrote my XY functions to not have to do a bytesPerChannel jump and let it rip. Worked like a charm, except for this little quirk :

    Code:
    NSLog(@"Pixel at %dx%d Hex RGBA : %x - R: %x G: %x B: %x A:%x",
    				  x,
    				  y,
    				  imagePixels[index].pixel,
    				  imagePixels[index].components[COMPONENT_RED],
    			           imagePixels[index].components[COMPONENT_GREEN],
    				  imagePixels[index].components[COMPONENT_BLUE],
    				  imagePixels[index].components[COMPONENT_ALPHA]);
    The in-memory browser and the component access through the unsigned char array is as I thought, Quartz did a fine job making it 0xRRGGBBAA in memory. However, if I use my uint32_t member to access and set the value, the value is reversed. The NSLog code above results in the following output :

    Code:
    2010-12-14 09:09:34.656 App[8973:207] Pixel at 1x1 Hex RGBA : ff555555 - R: 55 G: 55 B: 55 A:ff
    So if I want to manipulate the 32 bit value, I need to go at it as a ABGR pixel format. Bummer, that's a whole bunch of #defines to rewrite.
     
  5. ianray macrumors 6502

    Joined:
    Jun 22, 2010
    Location:
    @
    #5
    On a little-endian machine, yes.

    Is it possible to use native endian-ness? That will get you get the best performance (though with potential cost if network/disk protocols use a different endian-ness).
     

Share This Page