nMP & memory performance observations with various mem configs

Discussion in 'Mac Pro' started by analog guy, Feb 7, 2014.

  1. analog guy macrumors 6502

    Joined:
    Mar 6, 2009
    #1
    hey, folks:
    just took delivery of a 6c nMP and wanted to post a few observations.

    i ran geekbench3 with a number of memory configs and figured i would post the results.

    the 16GB chips are crucial RDIMMs.

    single-core/multi-core:
    3183/4578 (12GB - 3x4 bank #s 1, 2 & 3)
    1984/2055 (16GB - 1x16 bank #1)
    3096/3916 (32GB - 2 banks filled #s 1 & 3)
    3389/5377 (48GB - 3 banks filled #s 1, 2 & 3)
    3319/5840 (64GB - all banks filled)

    i found it surprising that 3x4 performed far better (single core) than 1x16. i was also surprised that 4x16 took a hit (albeit slight) in single performance over 3x16.

    would be curious if someone with a 4x4 configuration tests 4/8/12 & 16.
     
  2. VirtualRain macrumors 603

    VirtualRain

    Joined:
    Aug 1, 2008
    Location:
    Vancouver, BC
    #2
    Congrats on receiving your nMP!

    It's good to see that this is measurable. I might have guessed that Geekbench would largely fit in cache and thus not adequately stress the memory subsystem, but this clearly says otherwise.

    Keep in mind, the quad channel memory controller interleaves data across channels like a RAID0 array stripes data across drives... so 3x4 should outperform 1x16 handedly since the three sticks offer triple the bandwidth of a single stick.

    Another factor may also be that RDIMMs add an added clock cycle of latency vs. UDIMMs.

    I'm guessing the anomaly between 3x16 and 4x16 is simply the margin of error in this test.
     
  3. MacUser2525 macrumors 68000

    MacUser2525

    Joined:
    Mar 17, 2007
    Location:
    Canada
    #3
    I find that multi-core you have listed surprising my brothers mac mini will beat that, i5 model, you sure your not missing a 1 in front of them.
     
  4. VirtualRain macrumors 603

    VirtualRain

    Joined:
    Aug 1, 2008
    Location:
    Vancouver, BC
    #4
    Those are the memory scores, not the total score.
     
  5. AidenShaw macrumors P6

    AidenShaw

    Joined:
    Feb 8, 2003
    Location:
    The Peninsula
    #5
    Could you please post the individual test results as well?

    In the other discussion on this topic most of the individual tests were pretty close in performance, but a few greatly benefitted from more channels. It would be interesting to see that data for the full complement of channel populations.
     
  6. MacUser2525 macrumors 68000

    MacUser2525

    Joined:
    Mar 17, 2007
    Location:
    Canada
    #6
    Well that explains it.
     
  7. sirio76 macrumors regular

    Joined:
    Mar 28, 2013
    #7
    Thanks for your test;) I would like to see also how things performs in real applications, not only in synthetic benchmark.
    I'm still waiting for my 8core, in the mean while I've tried to run a couple of test on one of the nodes I'm using for distributed rendering in Vray(it's an i7-4930K machine, the CPU is nearly identical to the Xeon you find in the 6core nMP). A 10.000.000 polygons scene needs about 10GB to be rendered, render times was exactly the same with both 16 and 32GB 1866DDR3 RAM(8x2 and 8x4 DIMM). Of course I'm expecting that if you fill up all of your RAM with more complex scenes the result will be quite different, but as far as your project fit in memory you should not see significant decrease in performance, at least in Vray. Just my experience, probably there will be many different workload where running in quad channel mode will give you some performance gains.
    As soon as I'll get my 8core I'll test rendering performance with different memory configurations(16GBx1/x2/x3/x4, Crucial DIMM).
     
  8. analog guy thread starter macrumors 6502

    Joined:
    Mar 6, 2009
    #8
    here is the full info from the memory system sub-tests of GB3 (64-bit) for 3x4, 1x16, 2x16, 3x16 and 4x16.

    i labeled the files (you should see that if you save them), but the order appears to be what i listed above.
     

    Attached Files:

  9. analog guy thread starter macrumors 6502

    Joined:
    Mar 6, 2009
    #9
    thanks for that. i was shocked by the magnitude of the difference between 3x4 and 1x16. presumably, 4x4 would provide an even greater difference.

    now it makes more sense why the 16GB stock config is not 1x16 or 2x8 (which would make later upgrades easier/more economical for users) -- it's not just a small hit on that subsystem.
     
  10. CH12671 macrumors 6502

    Joined:
    Dec 29, 2013
    Location:
    Southern US
    #10
    I know you probably don't have the chips to run this test, but I wonder how efficient a 2x8 / 2x4 configuration would be (total = 24 gig). That's how mine will start life, and eventually go to 4x8 for 32 gig....I just don't see a reason to leave 2 slots open and have 3 chips in the drawer while I wait to purchase the remaining 2x8 from crucial....
     
  11. analog guy thread starter macrumors 6502

    Joined:
    Mar 6, 2009
    #11
    i don't have 2x8 chips but would love to see your results when you receive your machine.

    perhaps i could test 2x16+2x4 = 40GB (or 3x4+1x16=28GB) to see what difference mixing sizes might make.
     
  12. Umbongo macrumors 601

    Umbongo

    Joined:
    Sep 14, 2006
    Location:
    England
    #12
    Well it's just math. 1 DIMM can at most achieve bandwidth of 14.9GB/s, two can achieve twice that and so on. Bandwidth isn't always a key factor in real world performance though and it doesn't look like geekbench is really testing the capacity either so there is that to consider.

    ----------

    You can't mix UDIMMs and RDIMMS I'm afraid.

    Someone testing 8GB DIMMs could also test 2x8GB+2x4GB and the affects of adding an 8GB DIMM to the other three 4GB ones.
     
  13. analog guy thread starter macrumors 6502

    Joined:
    Mar 6, 2009
    #13
    yes--simple math but i hadn't done that calculation before so was surprised by the result.

    yes, you're right about the UDIMMs and RDIMMs.

    would be interesting to compare 4x8 vs 2x16 as well as the mixed 2x8+2x4 vs 1x8+1x4, 2x8, and 2x4 pairing that CH12671 proposed (hope he/she will test and report back).
     
  14. Lumpydog macrumors 6502

    Joined:
    Aug 3, 2007
    #14
    3096/3916 (32GB - 2 banks filled #s 1 & 3)

    Ok - I have 32GB as well but in banks 1 & 2.

    Should I be using banks 1 & 3??
     
  15. analog guy thread starter macrumors 6502

    Joined:
    Mar 6, 2009
    #15
    i tried both; performance was virtually identical in geekbench for all memory tests.
     
  16. CH12671 macrumors 6502

    Joined:
    Dec 29, 2013
    Location:
    Southern US
    #16
    He will definitely report back. My nMP will be "shipped in March." So I should have test results by the middle of April:D
     
  17. AidenShaw macrumors P6

    AidenShaw

    Joined:
    Feb 8, 2003
    Location:
    The Peninsula
    #17
    Thanks - but can you post the integer and float numbers as well?

    It should be expected that tests designed to bypass the cache and stress the memory system would show significant performance benefits from the added channels.

    How does it affect other tasks like JPEG and ZIP compression?
     
  18. analog guy thread starter macrumors 6502

    Joined:
    Mar 6, 2009
    #18
    i have all the data from the full test. i can post it later. any other requests so i can gather it in one post?
     
  19. analog guy thread starter macrumors 6502

    Joined:
    Mar 6, 2009
    #19
    full results:
    12GB (3x4)

    16GB (1x16)

    32GB (2x16)

    48GB (3x16)

    64GB (4x16)
     
  20. Lumpydog macrumors 6502

    Joined:
    Aug 3, 2007
    #20
    Awesome - thx!
     
  21. analog guy thread starter macrumors 6502

    Joined:
    Mar 6, 2009
    #21
    turns out that my 6-core @ 3,671 (the run i did with 48GB RAM) is the nMP with the highest single-core score for GB3/64-bit.

    here are all nMPs who ran the 64-bit test.

    the highest-scoring 2013 iMac i7 (run w/ 16GB) scored 13% higher (4,146 to 3,671) -- with 12% higher integer, 9% higher floating and 20% higher memory performance.
     
  22. AidenShaw, Feb 7, 2014
    Last edited: Feb 7, 2014

    AidenShaw macrumors P6

    AidenShaw

    Joined:
    Feb 8, 2003
    Location:
    The Peninsula
    #22
    Thank you. Thank you very much.

    This looks like the earlier report - the number of DIMMs is almost irrelevant for many of the tests. SHA1 multicore and SHA2 multicore were faster with 1 DIMM than with 4 (but probably within the sampling error - hey 'analog guy', wanna do 20 runs and give us the mean and standard deviation for every component score? ;) ).

    Looking at the group scores:
    Code:
                            1 DIMM  2 DIMM  3 DIMM  4 DIMM
                           ------- ------- ------- -------
    Floating Point Single    3825    3826    3828    3836
    Floating Point Multi    25531   25555   25529   25522
    Integer Single           3625    3641    3655    3646
    Integer Multi           20959   22686   23768   24282
    So,
    - virtual 4-way tie on Floating Single
    - virtual 4-way tie on Floating Multi-core
    - virtual 4-way tie on Integer Single
    - 1 DIMM is 86% of 4 DIMMs on Integer multi - but if you removed AES and Dijkstra you'd have a virtual 4-way tie, the rest of the integer multi tests were virtual ties

    Those L3 caches do seem to be effective on "non-bandwidth virus" programs.
     
  23. analog guy thread starter macrumors 6502

    Joined:
    Mar 6, 2009
    #23
    thanks for your analysis.

    what do you think of the relevance of the stream copy/scale/add #s where 3x4 outperforms 1x16?
     
  24. AidenShaw macrumors P6

    AidenShaw

    Joined:
    Feb 8, 2003
    Location:
    The Peninsula
    #24
    STREAM is a "bandwidth virus" benchmark designed to defeat all caches and measure the raw memory bandwidth of the system. It does nothing useful.

    IMO, it is interesting for people trying to get into the Top500 Supercomputer list, but mostly irrelevant for anyone considering an Apple, Windows or Linux desktop system.

    Most apps benefit from cache, and Intel is currently looking at 2 MiB to 2.5 MiB cache per physical core as the sweet spot. The GeekBench numbers show that is a good decision for almost all of the tests in GeekBench. There are probably some useful desktop apps that need extreme bandwidth, but not many.

    One thing that I was happy to learn from this discussion is that AES encryption is one of the bandwidth intensive apps. I'm buying systems for an application gateway prototype which will use 20-core systems to do SSL (AES) encryption. I've learned that populating each system with 8 DIMMs is the way to go. (Some systems only need 32 GiB, so they'll get 8x4GiB.)
     
  25. VirtualRain macrumors 603

    VirtualRain

    Joined:
    Aug 1, 2008
    Location:
    Vancouver, BC
    #25
    Thanks for taking the time to do this.


    Moral of this story? Cache is king! :cool:
     

Share This Page