Troubleshooting 'new' 4,1 dual CPU tray & mixing RAM

Discussion in 'Mac Pro' started by Sharky II, Feb 22, 2017.

  1. Sharky II macrumors 6502a

    Sharky II

    Joined:
    Jan 6, 2004
    Location:
    United Kingdom
    #1
    Hi,

    I recently found a good deal (eBay) on a 4,1 dual quad 2.26GHz CPU tray with 64GB of RAM. One side has 4 x 8GB Hynix 2Rx4 10600R (1333MHz) - the other CPU has HP/Hynix 4 x 8GB 2Rx4 12800R (1600MHz). Both of these are Registered and run at 1066 as I haven't upgraded to 5680/5690 yet.

    Upon putting the 2.26GHz tray into my machine, I was saddened to be getting the occasional sudden restart or a GUI freeze (or both). After restarting, spotlight would start indexing. I reset PRAM and SMC and it seemed to be 'fixed', I did a few hours of stress testing last night and all seemed fine. I booted this morning, and the machine suddenly restarted again a couple of times.

    I'm doing a bunch of troubleshooting and debugging (memtest), but my question is: Is there any obvious problem with mixing the Registered 1333MHz and 1600MHz RAM on the dual cpu tray - even if all the 1333 is on one side, and all the 1600 is on the other?

    I'm wondering if this could cause the freezing/restart.

    Cheers,

    Ed
     
  2. h9826790 macrumors 604

    h9826790

    Joined:
    Apr 3, 2014
    Location:
    Hong Kong
    #2
    Mixing RAM may cause issues, but usually not the problem. Especially you are just mixing RAM with different highest speed.

    Anyway, how's the temperature? Especially the north bridge temperature. That's one of the known weak point.

    Also, faulty HDD / GPU can also have this exact symptom.
     
  3. ActionableMango macrumors 604

    ActionableMango

    Joined:
    Sep 21, 2010
    #3
    I don't think so. I have mixed 1066 and 1333 RAM and the only problem is that it all runs at 1066, which is expected and normal.

    But it would be easy to test. Remove all of one speed out while distributing the remaining half appropriately, then see if the freezing/restarting goes away.

    I assume you weren't having this problems with your original tray (but you don't say, so I'm asking)?

    You might consider that the tray might be bad while you are still in your 30-day return window. I wonder about the history of these parts and the circumstances behind the seller getting thirteen used 2009 dual-CPU trays to sell on Ebay.
     
  4. Sharky II, Feb 22, 2017
    Last edited: Feb 22, 2017

    Sharky II thread starter macrumors 6502a

    Sharky II

    Joined:
    Jan 6, 2004
    Location:
    United Kingdom
    #4
    Hi!

    I currently have the fully working single CPU tray installed with the W3690 and am running memtest (Rember) on the 4 x 8GB 1333MHz. Was going to leave it running overnight. Then I planned to test the 1600MHz RAM

    I'm using the single CPU board so that I can pinpoint the problem to the RAM if there's a fault.

    I really hope the dual socket CPU tray is not damaged/faulty.

    I'll check on the Northbridge temp when the dual CPU tray is back in the machine and report back!

    For Northbridge temps, is it the I/O Hub Tdiode that I should look at in iStat? With the single CPU board running memtest, I/O Hub Tdiode is 68 degrees C.

    With regards to HD/GPU, this is the same machine that i've been using for a while with no issues, and there's never been any restarts or problems so it's unlikely to be those, I think!

    --- Post Merged, Feb 22, 2017 ---
    Hi, i've been doing those kind of tests all day and it's only ever had the problem with all 64GB of RAM in there (unless I'm going crazy and forgot). It's not an easy problem to recreate, so just because something runs for a bit, it doesn't mean it's OK.

    No problems with my current 6-core system! I have a W3690 and 24GB 1333 RAM, and the machine was flashed to 5,1. Running off an SM951 PCIe 256GB.

    I've also run my 3 x 8GB ECC Kingston RAM (un-Registered) in the dual cpu board without issues for a few hours.

    Sadly, i'm beginning to suspect the board but as the problem is not easily repeatable, it's a bit tricky to troubleshoot.

    By running memtest, I'm hoping to find a faulty memory stick that is the root of all the issues. One time I was running memtest with all 64GB in there, and that's when it suddenly restarted.

    I'm sure I could send it back (I haven't mentioned it to the ebay seller yet), but I REALLY want a 12-core system now...

    Cheers,

    Ed
    --- Post Merged, Feb 22, 2017 ---
    I got bored of waiting so have thrown all 8 Registered sticks back on the dual CPU board and am running memtest again- will leave it running overnight (unless it restarts!).

    After being on for 10 mins with the dual cpu board, IOH Diode is currently 74 degrees C with only memtest running.

    Machine is relatively loaded: 4 x HD, SM951 256MB, PCIe sound card, nVidia GT120, ATI 5770, and now a dual quad core 2.26GHz with 64GB RAM.

    Cheers!

    Ed
     
  5. h9826790 macrumors 604

    h9826790

    Joined:
    Apr 3, 2014
    Location:
    Hong Kong
    #5
    Correct, IOH is the north bridge.

    And I agree that should focus on the CPU tray (including CPU and RAM) if the same hardware works fine with another CPU tray.

    If your cMP shows very stable during stress test, and the restart / freeze almost only occur with light loading. I will seriously look into the north bridge. That's because the north bridge will run cooler when the CPUs are under stress (due to higher fan speed), which actually can make it more stable.
     
  6. Sharky II, Feb 22, 2017
    Last edited: Feb 22, 2017

    Sharky II thread starter macrumors 6502a

    Sharky II

    Joined:
    Jan 6, 2004
    Location:
    United Kingdom
    #6
    Thanks! It's at 76C (167F) at the moment, system is relatively idle, although memtest using 100% of one CPU, and fills the RAM. I think that temp is OK, from some googling? System Ambient is 30C (86F).

    I didn't even know there was a heatsink on the dual CPU tray (single CPU doesn't have one, just a much bigger CPU heatsink). If it's easy, I don't mind re-applying some thermal paste when (if?!) I change the CPUs to X5680/5690. I'm currently reading this: http://www.xlr8yourmac.com/archives/jul14/071114.html

    I'm running Macs Fan Control, Exhaust/Intake/Boosta are all set to auto (and show normal speeds), PCI and PS are set to 800/600 respectively due to the PC 5770 causing the fans to ramp up (unflashed graphics card 'bug').

    Thanks so much for your help,

    Ed
     
  7. orph macrumors 6502a

    Joined:
    Dec 12, 2005
    Location:
    UK
    #7
    trouble shooting tips
    boot from an second drive (a usb stick will work just format it and install osx on it) to see if you still have problems.
    (it may be bad drive or software problem)
    pull all but two ram sticks and see if you still have problems, if so swap slots and/or ram sticks.
    if you have a second macpro swap GPU see if you still have problems.
     
  8. Sharky II thread starter macrumors 6502a

    Sharky II

    Joined:
    Jan 6, 2004
    Location:
    United Kingdom
    #8
    Hi all,

    Left it running 4 loops of memtest with all 64GB in the new dual cpu board - took over 24 hours but no issues reported, no restart.

    I am currently letting Logic pro stress test the CPUs at around 1200% (of 1600%), set it to loop for a few hours.

    Assuming it all works ok, I think i need to test some 'cold' boots when the machine has actually cooled down. The times the machine had issues was shortly after booting.

    I'm also wondering if the machine had to 'get used' to the new board, not in a magical unicorn way, but in that i've done some restarts? SMC and PRAM resets etc...

    Essentially i'm trying to get it to exhibit the problem again, if it exists... before I order new CPUs.

    Cheers

    Ed
     
  9. Filin, Feb 25, 2017
    Last edited: Feb 25, 2017

    Filin Contributor

    Filin

    Joined:
    Mar 7, 2010
    Location:
    Ukraine
    #9
    Your backplane board flashed to 5.1, but new dual CPU tray with old 4.1 firmware?

    Maybe you have issues with SMC versions mismatch?

    Try downgrade to 4.1 on your old single CPU tray (you must have old Nehalem CPU), and then replace with new dual tray and upgrade to 5.1 again.

    Or just try downgrade-upgrade (5.1 -> 4.1 -> 5.1) firmware on dual CPU tray.
     
  10. h9826790 macrumors 604

    h9826790

    Joined:
    Apr 3, 2014
    Location:
    Hong Kong
    #10
    Doesn't really matter, my cMP is exactly in this situation. I flashed it to 5,1 and then swap in a new 4,1 CPU tray. Work flawlessly. The 5,1 firmware has nothing to with the CPU tray, and it won't change the SMC version.
     
  11. Sharky II thread starter macrumors 6502a

    Sharky II

    Joined:
    Jan 6, 2004
    Location:
    United Kingdom
    #11
    Thanks guys, I asked on here about that in another thread and learned that there was no issue with putting a 4,1 board in a 4,1->5,1 machine.

    Since the initial problems, I haven't been able to get the computer to suddenly restart or lock up and i've done memory and cpu stress tests for a couple of days. I just tried a cold boot after it was off for hours and again, no issue.

    I'm happy to try any tests you guys can think of to make it fail/exhibit any potential issues!

    Cheers,

    Ed
     

Share This Page