Care to learn more about the difference between the two CPU dies from Samsung and TSMC? Here is the link, I am sorry that it's in traditional Chinese which is from a Taiwanese site (not China). You probably don't know there are two Chinese language in written, but that doesn't matter.
http://tu0925399900.pixnet.net/blog/post/204868429-為何台積電16奈米a9處理器會勝過samsung-14奈米
The title says "why TSMC 16nm is superior to the SAMSUNG 14nm". I am not sure if you want google to translate them for you or not since you are anti-google from your avatar. But I am sure those diagram can help.
Majority of your arguments is saying that there is not enough sample. Well, as you know APPLE has the sample that you think is large enough, but they won't announce the true difference between the CPU. They want to stir it all so you won't get real logical answer.
You missed the point. And many of the forum members missed the point. You think the CPU difference under normal people's regular usage should resulted in being less than the claimed 20-30%. Common sense of people can agree with you. Statistical significancy in between the TWO CPUS is undeniable. And the answer is not from statistics, it's from the foundry difference in between the two companies if you can see from the diagrams of the site I linked.
Why the first few blog would like to test the CPU difference once they got new iPhone 6s, and 6s plus? Because that is what they are interested in. And its also a test that can be single out rather than mixing all other factors, radio chip, antenna, battery, apps. If there were different radio chip, and someone is interested in knowing the difference, they can do it. It doesn't need that much sample size as you claimed, hundreds, or thousands, because all other factors can cancel out or omitted. Have you got math problem in middle school by giving you only 5 variable to find an average? Anantech did so many first hand test between only two CPU, GPUs, who as asked him "please not fair, need more sample".
Your knowledge should tells you when you mix those variables, it cancel out most by each others. For example y = x1+x2+x3..... In this case, people only want to know x1, (cpu for example) and they know there are two CPU makers,and are interested in finding the difference x1 in contributing y, the battery life, that's all.
So, the buttery life Y, can be affected as much as 2 hrs by the cpu using the geek bench app. Isn't that crystally clear.
If I am a heavy gamer, that's the most factor that my regular usage of the phone being affected. You claim yourself waiting for more daily normal usage is actually trying to blur the picture while people already focusing on. So don't argue with saying it needs more regular, normal usage comparison. That's not something meaningful to APPLE, and by that you are helping APPLE to cover that big difference in between the two CPUs.