Random boot failure

Discussion in 'Mac Pro' started by LD99, Dec 13, 2010.

  1. LD99, Dec 13, 2010
    Last edited: Dec 13, 2010

    LD99

    Dec 13, 2010
    Hi there,

    I have a early 2008 Mac Pro which I have since added a video card, a couple of harddisks and a SDD for boot drive. It has been working like a charm for a long time.

    From about a week ago, it has started having a bit of a random fit while booting up. When starting from cold, it can do any of the following:

    (1) Get stuck at white screen with the apple logo
    (2) Doesn't make the boot sound and all screens stay dark and fans all going
    (3) Like 2, except the power light is flashing
    (4) Boots up like normal, then kernel panic (or whatever you call it) within a few minutes (and everything seems to work as per normal prior to the kernel panic)
    (5) Boots up normally and works normally.

    Lately it seems to do (4) mostly.

    The HDD and SSD are ok. I've removed all the USB HDDs too. At first I thought it was the HDD, then maybe I thought it was the motherboard. But now the computer works fine as long as I don't turn it off.

    Rebooting is a little risky but most of the time it seems to reboot fine. The problem only occurs if when the computer is booting from cold (after being off for a while).

    Anyone got any thoughts? Thanks!!!
  nanofrog

    May 6, 2008
    Check the LED indicators on the logic board, and report what you get.

    If you're not sure about what I'm talking about, locate your manual (it's in there). If you can't find it, download a copy from Apple, and print the pages you need to do this.
  LD99

    Dec 13, 2010
    Yeah, I found a service manual online (and realised my Mac Pro is actually Early 2008 model... I forgot how I bought it like 3 weeks before the 2009 model came out), and located where the diagnostic LED is.

    And none of them are lit.:confused:

    Then again, it might be because it is now running fine. Like I said, it only encounters problems at boot up randomly. And it seems to be ok when the machine is hot.

    I shall keep the case open so I can check on the LEDs when booting fails next time.

    Still... it just seems weird. If it is a hardware failure (it should be... I was running Leopard and then I reinstalled with Snow Leopard just so I can exclude software as a possible cause), wouldn't you expect the error to pop up consistently? It seems more weird when I encounter problem when the machine is cold and not so much when it is warm (so it is not an overheating issue either).

    Very bizarre.
  nanofrog

    May 6, 2008
    You could try a PRAM reset, but I don't expect this to solve anything.

    But keep an eye on the LED's when it pulls one of these strange boots, and report back.
  LD99

    Dec 13, 2010
    Alright... for an update.

    The memory raiser board thing (I don't know the technical name of it... Mac Pro 2008 has two boards for its memory chips, 4 slots on each board) has a red light (the DIMM4 one, the other three are dimmed). When that is on, the computer runs fine.

    When it is not on, the computer froze, and the logic board LED light lit up. There are two red lights... ERRA and ERRB. This time the OS didn't even go into the kernel panic bit... it simply froze.

    My theory (purely speculatively... I have no idea what those indicator lights mean) is that I have a faulty DIMM. So I have replaced the chip in the DIMM4 slot with my old (factory original) chip. So far no red light came on and the computer seems to boot up fine.

    But I think I will leave the computer off overnight, and then boot up when it is cold. (In the past the computer will boot up fine when it is warm... weird). Then I'll know if I am finally out of the woods. Will report back.

    Just as an aside, for some strange reason, my system profile now reports 14GB memory rather than 15 (2GB x 7 + 1GB x 1). I wonder if it is because I am not supposed to mix and match... or maybe it is the slot/board that is defective.
  philipma1957


    Apr 13, 2010
    Howell, New Jersey

    My best guess is an invisible crack in a solder joint or a pcb board. when your machine is warm all metal parts expand thus the crack is forced to make contact when the machine is cool the metal parts shrink thus the crack opens and no boot. By invisible crack I mean very thin less then a hairline crack you need a jewelers loop to find it.
    I would guess it is in the slot board not the ram but if you take the removed ram and put it in a different slot and you do not boot it would be the ram.
  cutterman

    Apr 27, 2010
    You seemed to have narrowed it down to the memory subsystem. At this point I would suggest a systematic approach. Start with 1 dimm and run a memory test on it (I like memtest). If it passes mark that dimm and replace it with another. If none of the dimms pass then as suggested above it may be the tray pcb at fault.

