Got the feeling that the geekbench test, stays in the same CPU/die area, doesn't look for outside interupts, by not having outside components on. That may trip up some very precise edge cases.
Also, In a mobile chip, having the screen and LTE on means it normally would be going into low power mode (because few mobile apps and desktop apps actually work in those cases!). Nobody's doing raytracing or heavy encoding on their phone WHILE in airplane mode! This is what that stupid screen off, LTE off test is basically simulating.
OK. So there is indeed a trigger that will set CPU to operate in different modes. In that case, I'd say Geekbench is meaningless when you're comparing two devices with different CPU model.
As far as I know, Geekbench doing the battery test by keeping CPU usage at the same level continuously. If CPUs would operate in different mode, the computing performance must be different too. Which means the computing tasks they'd finished in a same period of time are also different. It's not the case in real world, since one may only use his phone to accomplish some tasks, the faster CPU can operate, the shorter time it would take.
I'd say they should assign the same amount of tasks for each devices, and compare both the battery lasting time and total task counts.