Random spontaneous reboots MacPro 1,1 with Yosemite

Discussion in 'Mac Pro' started by the bug, Nov 28, 2014.

  1. the bug macrumors member

    the bug

    Joined:
    Feb 21, 2014
    #1
    Argghh, random reboots are driving me crazy.
    They started a couple of days ago on my MacPro 1,1 flashed to a 2,1 running Yosemite with Pike's boot.efi file.
    I didn't change anything or install any new software, and my temperatures are all well within specs according to Temperature Monitor.

    Console isn't a whole lot of help as there is no kernel panic, just a reboot out of the blue from time to time.
    Only crash reports are these discoveryd crashes :

    Nov 28 23:01:13.827512 localhost discoveryd_helper[166]: Basic RemoteControl com.apple.discoveryd_helper Starting XPC Server
    Nov 28 23:01:13.828290 localhost discoveryd_helper[166]: Detailed RemoteControl com.apple.discoveryd_helper XPC connection 0x7fc9829002c0: start (pid=51, <unknown> not root)

    Not sure if that is a symptom (because of a spontaneous reboot) or part of the cause.
    Sometimes it will run for hours and hours, then other times (as you can see below) it will reboot 3 or 4 times in 30 minutes.

    The only strange thing I can see from the Console is the previous shutdown codes.

    11/28/14 10:48:54.000 PM kernel[0]: Previous shutdown cause: 0
    11/28/14 10:52:39.000 PM kernel[0]: Previous shutdown cause: 5
    11/28/14 11:01:12.000 PM kernel[0]: Previous shutdown cause: 0
    11/28/14 11:27:02.000 PM kernel[0]: Previous shutdown cause: 0


    They seem to randomly change between a code of 5 (which seems to be normal shutdown from the Finder menu), and 0 (which seems to be loss of mains power).

    I have tried the following things without any success :

    - I have zapped the PRAM, and reset the SMC.
    - I reloaded 10.10, and upgraded to 10.10.1.
    - I tried running from another outlet, and bypassing my surge suppressor.
    - I reinstalled the original 5150 chips.
    - I put the original video card back in, unplugged everything except the keyboard and monitor, and removed all 3rd party RAM.

    Still got random spontaneous reboots.

    Then finally while running AHT from my original install CDs it passed the first time, rebooted the second time, and passed again the 3rd time.
    It was at that point I totally ruled out software, and I'm thinking that maybe my power supply is starting to get flaky due to the previous shutdown of 0.

    Has anyone else here experienced something similar ?
    Is there anything else I should be checking before I start looking for a power supply ?

    Any help would be greatly appreciated, thanks in advance.

    - Jay
     
  2. the bug, Dec 1, 2014
    Last edited: Dec 1, 2014

    the bug thread starter macrumors member

    the bug

    Joined:
    Feb 21, 2014
    #2
    Wow, I've just re-read my last post,and it seems a little rambling and disjointed.
    Sorry, it was typed out between unexpected reboots every 10 minutes, while trying to get out the door last Friday (it was that kind of a week).

    So now I'm back in front of my Mac, and have had a little more time to look at things.
    I can keep the machine running (it's been running a couple of hours now, including running memtest for an hour), if I pull all the memory from the bottom riser card.

    It does not seem to matter which riser card I put in which slot (I have tried swapping them) or which of my 8 DIMM modules I put in the upper riser.
    The only thing that seems consistent is that if I try to use more than any 4 of my memory modules, the machine spontaneously reboots every 10-15 minutes.

    Could this be because of the load on the power supply ?
    I know that these DIMMs use a lot of power, so could it be possible that any more than 4 DIMMs drawing power is enough to make the supply punt ?

    Or could there be something on the logic board that has to do with the bottom riser card slot ?
    I'm thinking it is between the PSU or the logic board, but I'm not really sure that I would bother fixing this machine if it is the logic board.

    Any advice would be greatly appreciated, as I really don't want to shell out the cash for a new power supply, if there is something else that may be causing this.
    Thanks.

    - Jay
     
  3. the bug thread starter macrumors member

    the bug

    Joined:
    Feb 21, 2014
    #3
    Well I figured I would give an update, for the benefit of folks who are having similar problems and searching for any info, hopefully this helps.
    Sorry it's really longwinded, but hopefully it helps some future Google searcher… here goes :

    I tried another power supply (thanks to Hennesie2K for getting one out to me so quickly), and unfortunately that did not take care of the random reboot issues, so I did some more troubleshooting.

    I knew that spontaneous reboots without a panic can be caused by bad RAM, so I decided to start testing all my FB-DIMMs pair by pair, running memtest on them.
    I have 5 pairs of DIMMs, the 2 original 512K pair, 3 pairs of 1 GB sticks, and a pair of 2 GB sticks.

    I stuck a pair of 1GB sticks in slots A1 and A2 and ran memtest, passed.
    I stuck a second pair of 1GB sticks in slots B1 and B2, ran memtest and it spontaneously rebooted after a few minutes.
    I then put the third pair of 1 GB sticks in slots B1 and B2, same thing spontaneous reboot.
    However when I put the 2 GB sticks in slot B1/B2 it passed, and the same with the 512K sticks, both passed.

    So the bottom line that I discovered was that whenever I installed 2 matching pairs of DIMMs in successive pairs of memory slots, I will get the spontaneous reboot failure almost instantly.

    Then it hit me… DOH... memory interleaving !
    You have replace in Mac Pro DIMMS matched pairs for "dual channel" (64 bit) memory access, but the reason you are supposed to follow Apple's sequence of populating the memory slots in order is to get "quad channel" (128 bit) memory access.
    More info here : http://www.anandtech.com/show/2064/12

    Back in the PPC/68K days I remember having to play games with moving memory SIMMs (usually from different manufacturers) around to intentionally "de-interleave" them to get rid of strange memory errors from RAM running at slightly different speeds.

    So I messed around and experimented some more, and came to the conclusion that whenever OS X tries to interleave 2 pairs of DIMMs for "quad channel" memory access, I will eventually get a spontaneous reboot.
    It happens almost instantly, if I put 2 pairs of DIMMs that are the same size, in successive pairs of memory slots.

    However if I mis-match the sizes it seems like I can run for an hour or so before I finally get a spontaneous reboot, but eventually it will reboot running memtest.

    The only solution I found is to force the memory to be in "dual channel" mode all the time by intentionally not following Apple's advice on RAM installation and leaving the bottom riser card totally empty.
    Right now I have 6GB in the upper riser card, and nothing in the bottom card right now and it's been rock solid for hours of torture testing while I am typing this with fans blazing. :eek:

    The Northbridge chip is what is is responsible for all this memory management magic, so I even tried re-doing the original factory heat-sink compound with some Arctic Silver, and even though my temps dropped from the low 50's C to the mid 40"s C the problem still is happening.

    Armed with this new info I did yet another Google search for "MacPro spontaneous reboot" but this time I included "riser card", and imagine my surprise right here on the MR forum:
    http://forums.macrumors.com/showthread.php?t=1683131

    Ah, someone has been here before me, I knew I couldn't be the first !
    At this point I am happy running with 1 riser card empty and only 6GB of RAM, if I can milk another year or 2 out of this machine.
    I will not be replacing the logic board unless I come across one for $50 or less (and that's not going to happen).

    I'm thinking there must be a problem with either the Northbridge chip, or the supporting circuitry on the logic board that is causing this to happen.
    I did a visual inspection, and all the caps, components and traces around the Northbridge chip look good (I didn't pull the logic board to check the solder joints underneath).

    I am an electrical engineer by trade, and I downloaded this ( http://www.intel.com/content/www/us/en/chipsets/5000x-chipset-memory-controller-hub-datasheet.html ) hoping I could figure something out (yeah, right) :confused:.
    To be honest with you, trying to read this just made my brain hurt, this is well above my pay grade. :eek:

    Maybe this helps someone smarter than me in the future figure out a fix, but I think I'm just going to run my system on one riser card until it dies.
    As I said earlier, I hope this helps someone in the future, as my system seems to be rock solid with all my memory loaded in the upper riser card, but only time will tell if that lasts.

    Good luck.

    - Jay
     
  4. gmargolis macrumors newbie

    Joined:
    Mar 17, 2015
    #4
    I am having the same random failure on a Mid 2012 MacBook Pro Retina/16 GB RAM/500 GB Flash, now on its third logic board. The original and first replacement boards had different failures. The problem started happening after installation of 10.10 and continues with 10.10.2.

    In each case, full diagnostics were run prior to the logic board replacement, so if this is caused by a memory problem, there was no sign of it using Apple's diagnostics.
     
  5. chogue23 macrumors member

    Joined:
    Mar 16, 2015
    Location:
    Waco, TX, USA
    #5
    I am glad that you got the memory issue diagnosed. I have all 8 slots in mine filled with the same batch of ram that I pulled from a server, so they are all exactly the same. I wonder if that would make a difference in your case.

    With the Macbook Pro Retina, I would be more likely to point my finger at heat issues. I know my Mid 2012 non Retina Macbook Pro gets very hot with the default fan settings, so I use smcFanControl to keep an eye on the temperature. With the default fan settings at 2000rpm I was getting temperatures of almost 75 degrees Celsius, and it was hot enough to burn my legs. I bumped the fan up to 3500rpm and it will keep it at around 45 degrees as long as it stays off of heavy fabrics, and the fan noise isn't very noticeable to my musically trained ear. When I start doing heavy work, I'll bump the fan even higher to keep it cool
     
  6. gmargolis macrumors newbie

    Joined:
    Mar 17, 2015
    #6
    The RAM on a Mid 2012 Retina MacBook Pro is soldered on the logic board.

    Temperature has not been a problem on this unit — I monitor the CPU temperature and the fan rarely has to run faster than barely audible unless I'm doing some heavy-duty image processing or rendering.
     
  7. the bug thread starter macrumors member

    the bug

    Joined:
    Feb 21, 2014
    #7
    Well, figured it was time for an update.
    I've been running my 1,1 (-> 2,1) happily on 1 riser card for some months now.

    When I last posted, MacPro 1,1 logic boards were running about $90-$120 USD, which was more than I wanted to invest.
    I happened to be on ebay last night, and saw that prices on 2006/2007 MacPro parts have really dropped.
    Lots of 2006 logic boards on there now for $50-$75.
    I scored one for under $50 shipped, and not being one to leave well enough alone, I decided to go for the 1,1 "brain transplant".

    I will post an update when the operation is complete, as to whether it worked or not .
    I'm about 80% certain that it is the logic board, and not 1 (or maybe both) riser cards, keeping my fingers crossed.

    So, I know I have to de-authorize this motherboard in iTunes, and I may loose my Time Machine backups, but anything else I have to look out for software wise ?
    Curious also, if I will have to call Apple to re-authorize FaceTime and Messages, like I had to when I upgraded to Yosemite the first time.

    Any other words of advice would be appreciated !

    - Jay
     
  8. the bug thread starter macrumors member

    the bug

    Joined:
    Feb 21, 2014
    #8
    Woo Hoo ! :)
    It looks like the motherboard has taken care of the random reboot problem.

    Loaded up both riser cards, and ran memtest for several hours today, no errors or reboots.

    Hopefully my longwinded ramblings can help some other "poor sap", with similar problems. ;)

    - Jay
     

Share This Page