Integral to the mathematics of the RC5 algorithm are 32-bit rotate operations.
For whatever reason, the designers of the IA32 (32bit Intel x86) and the PowerPC architectures decided to implement the rotate function as a hardware instruction.
Many other CPUs do not have built-in hardware rotate instructions and must emulate the operation by (at the very least) two shifts and a logical OR. This handicap is why many non-32bit-Intel [1] and non-PowerPC computers run RC5 slower than one might expect based on real-world benchmarks.