Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
No what is depressing is that no one sees the glaring mistake except Lloyd and I. I am not as charitable as Lloyd though, I think I gave you guys enough help. If you still can't figure it out, you'll be making 5 figures the rest of your life.

That's one of the most depressing replies I've seen on this forum.

"I have no idea what you said, but you're wrong" is so aggressively ignorant that I actually hope you're just trying to provoke him, and not actually serious.


----------

I would optimize code that is correct first lol. THat is what is so freaking hilarious, guy wastes all this effort for nothing.

1. He measured. That's the most important thing of all. If you don't measure, you don't know whether the code you want to make faster is actually worth spending your time on, and you don't know whether the time you spent to make it faster is actually making it faster, so it's just a waste of time. So A says "I write the code this way because it runs faster". B says "I checked, it doesn't".

2. He looked at the assembler code. _If_ you go down to that level, then writing unreadable code in the hope that it runs faster, without actually checking, is again just a waste of time. He looked at the code, and there was no benefit from writing a loop that was harder to understand. Actually, that loop performed two branches, which are most likely to hurt performance. So A says "I write the code this way because the assembler code is better". B says "I checked, it isn't".

Micro-optimisations on that level rarely ever pay. Here's what might pay: Use vectors.

Code:
typedef unsigned char vec_uint8 __attribute__((__vector_size__(16)));
typedef int vec_int32 __attribute__((__vector_size__(16)));
typedef float vec_float __attribute__((__vector_size__(16)));

This declares types that are vectors of 16 bytes, 4 ints, or 4 floats. Both MacOS X and iOS compilers fully support these types, so you can do up to 16 operations in a single instruction. That can give you a huge factor in speed.

Use multiple threads with GCD. No problem running 8 threads on a 15" MBP, or two threads on an iPhone. 2 to 8 times less time without problems.

Make sure that you know about your caches. When you have lots of data, do all the work on a subset that fits into cache, then another subset that fits into cache, and so on. This can make a _huge_ difference.

All these things operate on a much higher level, and that's where the money is.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.