An opinionated CPU benchmark of the 2016 15" MBP

Discussion in 'MacBook Pro' started by leman, Jan 8, 2017.

  1. leman, Jan 8, 2017
    Last edited: Jan 8, 2017

    leman macrumors 604

    Oct 14, 2008
    Much has been said here about the performance (or lack thereof) of the 2016 MBP models. Many tests (e.g. Geekbench, which now sadly became the golden standard of Mac performance benchmarks) suggest that the Skylake CPUs are barely faster than the older Haswells, and in some cases — even slower. As I have now finally received my new 2016 model to replace the late 2015 one, I was curious about whether there will be any difference in the CPU performance for what I do (spoiler: oh boy, it does, and how). As I spend most of my time doing statistical simulations in R, that appeared to be a good domain for testing. Therefore, I ran a computationally intensive randomised statistical test and evaluated the results. Read below for details.

    Machines: I am testing a mid 2015 MBP equipped with the 2.5Ghz i7-4870HQ CPU and a late 2016 MBP equipped with the 2.9Ghz i7-6920HQ CPU. Yes, this is an old mid-tier versus the newest top-tier, but thats what I have. Of course we expect the 6920HQ to be faster. However, if we look at Geekbench results, the difference between the two CPUs doesn't appear to be that large — in the ballpark of 15% for single core scores and 7% for multi-core scores. Also, the max turbo boost of both CPUs is fairly comparable (difference of 100Mhz only).

    Test: I ran a simple randomised chi squared test with 50000 monte carlo replicates in R. The implementation of this test is single-threaded, so it uses only one core and therefore evaluates the single-core performance. To estimate multi-core performance, I have run 4 copies of this test at the same time, using 4 different cores. Doing this of course won't complete the task faster, but it should show the penalty the CPU imposes for doing heavy multi-threaded work — as we know, if all cores are loaded, the maximal clock will be lower in comparison to when only one core is loaded. Basically, when running the same task on 4 cores, we expect each of the cores to be slightly slower than when running it on a single core only. But of course, at the same time, the 4 cores will do a much larger amount of work in the same timespan.

    Note: this test is fairly close to what will be used in real-world analysis. Despite its simplicity, it involves complex memory access patterns, cache mismatches, branch mispredictions etc. Also, it only affects the CPU and RAM subsystem. Disk speed and GPU are not performance factors. This work is very different from image processing (what most other tests seem to focus on), because it does not involve linear memory access and highly optimal vector code.

    Methodology: both computers ran the same version of R and OS X, they were in the same room on the same desk (control for ambient temperature). WIFI and Bluetooth were off. Other apps were closed. Each test (per CPU and single/multi-core) was run 100 times in succession. This was to make sure that we get reliable results and also see whether there is throttling going on. The random number generator was reset to a constant seed before each run to make sure that every run does the same work. I did not monitor the temperature or the CPU clocks.


    Results: the graph with runtimes is attached. First of all, the 2016 system run the entire benchmark in 1718 seconds, while the 2015 system run it in 2158 seconds. In other words, the 2015 system took 1.26 times longer to perform the same task. The single-core time for the 2016 system is 8.08s on average and for the 2015 system 9.92s on average. The 2015 system therefore needs 1.23 times as long to run the single-threaded R randomised chi-squared test on average. Results for both CPUs show some minor fluctuations between the runs. The density distributions of times appear to be normal mixtures, with a major component around the sample mean and some small significantly slower runtimes — nothing alarming here.

    Multi-threaded results are very interesting. First, as expected, the times are generally slower of all 4 cores are utilised. Thats because the CPU can't maintain high turbo if all cores are loaded. On average, Skylake needs 9.03 seconds per core per run in this scenario, while the Haswell needs 11.58 seconds. The full-load penalty is thus 1.12 times as long for Skylake and 1.17 times as long for Haswell on average, compared to time needed in a single-core scenario. In addition, Haswell on average needed 1.28 as much time in multi-core scenario than the Skylake.

    Discussion: for this particular task, the top-tier Skylake outperforms the mid-tier Haswell by a factor of 1.2-1.3. This is a very substantial increase in performance. While it might appear unfair to compare mid-tier to top-tier, I'd like to note that the measured performance differences are much more dramatic than what synthetic mixed tests such as Geekbench suggest. I would guess that the difference between the same-tier CPU is still most likely to be in the ballpark of 1.1-1.2. In addition, Skylake is much more efficient in running code that takes advantage of multiple cores — it can maintain relatively higher turbo boost frequencies than the Haswell. In fact, with these two CPUs, the Skylake per-core performance in full 4-core load is still substantially higher than what Haswell manages in single-threaded mode. Also, neither laptops show any sign of substantial thermal throttling under prolonged load (there is some heavy fluctuation over the first few runs, I guess thats where the CPU tries to find its "comfortable" spot). Subjectivelly, the 2016 MBP was cooler and quieter over the entire ordeal.

    Conclusions: for CPU-based work with complex memory access and branching patterns, which goes beyond image processing (what other tests seem to focus at), the Skylake CPU offers some very noticeable performance improvements over the last years model. It is also more efficient when all cores are under heavy load. In addition, I have to say that I am very impressed by the new cooling system. With the laptop being impossibly thin I was worried that it would be prone to throttling under load. No chance. This thing runs relatively cool and quiet even if the CPU and GPU are heavily taxed, and does so for hours.
  2. MrGuder macrumors 68020

    Nov 30, 2012
    Wow I'm impressed with your expertise and commitment to testing. Great job!!
  3. leman thread starter macrumors 604

    Oct 14, 2008
    Thank you for your kind words! But this test is far from being thorough, its a quick and dirty thing that took about an hour and half to do (including the benchmarking time). Ideally, I should be monitoring temperatures/clocks and system load at the same time and doing time series analysis but I really didn't feel like it on a Sunday...

    I still hope that this quick test shows that the story is much more complex than often thought and that the new MBP can be a very decent upgrade for some people. For me at least, going from the mid-tier 2015 to the top-tier 2016 is a substantial difference.
  4. keviig macrumors 6502

    Jun 7, 2012
    Appreciate you taking the time to do some proper testing! Most reviews now a days barely scratch the surface. Well done!
  5. Bonaqua macrumors member

    Jan 10, 2014
  6. Fzang macrumors 65816


    Jun 15, 2013
    So much better than "I think spotlight might be caching or something". I'm crying tears of joy.
  7. Charlesje macrumors member

    Nov 17, 2016
    Thanks Leman for your testing. Your results are in line of my findings in audio production (CPU oriented) use. Really generally speaking on similar tiers Skylake MBPs provide a 15 % boost in CPU performance, which is is the first significant boost in years. And indeed, the 2016 MBP does so in a much quieter way. This, to me, is even its most important advantage!
  8. aevan macrumors 68000


    Feb 5, 2015
    Really enjoyed reading this. And not because I like the results (although, the outcome was what I was hoping for, I admit) but because it was written so well. It almost reads like a scientific experiment.

  9. Turpentine222 macrumors newbie

    Dec 29, 2016
    Great test. Makes me think I should do something similar using Matlab, the scientific software I mostly work with.
  10. leman thread starter macrumors 604

    Oct 14, 2008
    Please do! I am sure that it will help out a bunch of people that are still undecided.
  11. ncrypt macrumors regular

    May 16, 2012
    Great work leman. Would be interested to see which machine had the greatest % battery discharge during the tests as well to see which CPU handles high-workload power efficiency better

Share This Page