Herein lies the problem. Both chips are suppose perform within specification in terms of raw performance, power, reliability. The difference should be within few percent. So far many users have repeated the performance and battery test and showing a disconcerting trend.
Geekbench shows Samsung A9 is fast by few point (less than 1-2%), but largely statistically insignificant if you run it 1000 times. 1-2% faster should not translate into 2 hours additional battery drain.
Antutu shows TSMC A9 consistently faster for some reason. The battery drain difference remains approximate 2 hours in a burn test.
A burn test has NOTHING TO DO WITH REAL LIFE, especially when it comes to battery.
Both chip seemingly don't throttle much under loads used in everything normally use on a phone (unless your running ray tracing on your phone...) and are so fast that very little will put them at a place where the benchmark takes them.
You comment is the same as:
- Taking two cars with one that will consume 20% more if driven at 90 miles per hour for an hour.... But, only 2-3% more if driven at less than 70 miles /hour during the same time (and possibly 5% if driven at 80). Technically, one is better, but it won't matter to you unless your doing "Le Man" with your car.
- Plane consumes the same as X and perform in similar ways when within normal operating range, but if you get a test pilot, and go up towards the edge of what they can do (which is often outside normal range), you got all sort of non-linear affects in airflow, the motors may perform differently when pushed, that may make them consume differently and make them very different to pilot from each other.
Same here, thermal effects and leakage is non linear, edge cases will reveal them and the limits of each system, but don't tell you much about actual performance when used within the envelope they were designed for.
Engineering tries to get the normal range well within the whole performance envelope so nobody hits the edge in real life.
Even a FPS video game won't stress the CPU enough to heat up as much as 3 lines of code in a loop can do.
A system is built to spec based on USE CASE, running a benchmark is not part of the use case. Simple as that.