Sudden kernel panic epidemic on 09 pro

Discussion in 'Mac Pro' started by IainH, Jul 1, 2011.

  1. IainH macrumors member

    Joined:
    Mar 11, 2009
    #1
    Recently Ive started to get a rather worrying amount of kernel panics. I figure one or two a year isnt bad...but one every few weeks (and now days) is just plain scary.

    Ive included a few of the crash logs below (purged the bottom crap that I believe to be pointless). Id love it if someone who understands these could take a look and tell me whats probably wrong?

    From my attempted understanding of them something has gone bad in the RAM. For some reason I thought removing and reseating all the sticks in different slots (6 of them) might help. Not sure why. Another panic just happened with an error log that looked the same. RAM is installed as per manual recommendation (if that helps?)

    During these ive either been playing WoW, web browsing with random crap open or watching tv via an elgato eyetv.

    Onto the logs!

    Code:
    Interval Since Last Panic Report:  112110 sec
    Panics Since Last Report:          1
    
    Mon Jun 27 23:17:15 2011
    Machine-check capabilities (cpu 0) 0x0000000000001c09:
     family: 6 model: 26 stepping: 5 microcode: 17
     Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
     9 error-reporting banks
     threshold-based error status present
     extended corrected memory error handling present
    Machine-check status 0x0000000000000004:
     machine-check in progress
    MCA error-reporting registers:
     IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid
     IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid
     IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid
     IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid
     IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid
     IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid
     IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid
     IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid
     Package 0 logged:
     IA32_MC8_STATUS(0x421): 0x0000000000000000 invalid
     Package 1 logged:
     IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid
      Channel number:         15 (unknown)
      Memory Operation:       read
      Machine-specific error: Read ECC
      COR_ERR_CNT:            0
      Status bits:
       Processor context corrupt
       ADDR register valid
       MISC register valid
       Error enabled
       Uncorrected error
     IA32_MC8_ADDR(0x422): 0x000000000a79eac0
     IA32_MC8_MISC(0x423): 0x5494762900081280
      DIMM:     0
      Channel:  2
      Syndrome: 0x54947629
    panic(cpu 13 caller 0xffffff80002d0238): Machine Check thread:0xffffff801cd31588 at 0xffffff7f811dd58a, registers:
    CR0: 0x000000008001003b, CR2: 0x00007fff5fbff298, CR3: 0x0000000000100000, CR4: 0x0000000000000660
    RAX: 0x0000000000000001, RBX: 0xffffff801cc9f000, RCX: 0x0000000000000001, RDX: 0xffffff7f811ef4c0
    RSP: 0xffffff80faf2bd60, RBP: 0xffffff80faf2bda0, RSI: 0xffffff801cc9f000, RDI: 0xffffff801cbbce00
    R8:  0x0000000000000000, R9:  0x0000000000480000, R10: 0x0000000000080000, R11: 0xffffff80002d648b
    R12: 0xffffff7f811efc68, R13: 0xffffff801cbbcac0, R14: 0x0000000000000001, R15: 0xffffff7f811edd18
    RFL: 0x0000000000000082, RIP: 0xffffff7f811dd58a, CS:  0x0000000000000008, SS:  0x0000000000000010
    Error code: 0x0000000000000000
    
    Backtrace (CPU 13), Frame : Return Address
    0xffffff80fb287eb0 : 0xffffff8000204d15 
    0xffffff80fb287fb0 : 0xffffff80002d0238 
    0xffffff80fb2880a0 : 0xffffff80002e48ff 
    0xffffff80faf2bda0 : 0xffffff7f811d5567 
    0xffffff80faf2be80 : 0xffffff7f811d68a7 
    0xffffff80faf2bf20 : 0xffffff800022e409 
    0xffffff80faf2bf40 : 0xffffff80002083a4 
    0xffffff80faf2bf80 : 0xffffff800027f532 
    0xffffff80faf2bfa0 : 0xffffff80002c8527 
          Kernel Extensions in backtrace (with dependencies):
             com.apple.driver.AppleIntelCPUPowerManagement(142.6.0)@0xffffff7f811d4000->0xffffff7f811f7fff
    
    BSD process name corresponding to current thread: kernel_task
    
    Mac OS version:
    10K540
    
    Kernel version:
    Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64
    System model name: MacPro4,1 (Mac-F221BEC8)
    
    Code:
    Interval Since Last Panic Report:  6505 sec
    Panics Since Last Report:          1
    
    Tue Jun 28 20:03:57 2011
    Machine-check capabilities (cpu 8) 0x0000000000001c09:
     family: 6 model: 26 stepping: 5 microcode: 17
     Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
     9 error-reporting banks
     threshold-based error status present
     extended corrected memory error handling present
    Machine-check status 0x0000000000000004:
     machine-check in progress
    MCA error-reporting registers:
     IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid
     IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid
     IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid
     IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid
     IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid
     IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid
     IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid
     IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid
     Package 1 logged:
     IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid
      Channel number:         15 (unknown)
      Memory Operation:       read
      Machine-specific error: Read ECC
      COR_ERR_CNT:            0
      Status bits:
       Processor context corrupt
       ADDR register valid
       MISC register valid
       Error enabled
       Uncorrected error
     IA32_MC8_ADDR(0x422): 0x000000000a79e7c0
     IA32_MC8_MISC(0x423): 0x72a3453200085840
      DIMM:     0
      Channel:  2
      Syndrome: 0x72a34532
     Package 0 logged:
     IA32_MC8_STATUS(0x421): 0x0000000000000000 invalid
    panic(cpu 8 caller 0xffffff80002d0238): Machine Check thread:0xffffff8021c54260 at 0xffffff80002c5530, registers:
    CR0: 0x0000000080010033, CR2: 0x0000000113da5000, CR3: 0x00000001da64a000, CR4: 0x0000000000000660
    RAX: 0xffffff8021c54260, RBX: 0xffffff80266fff80, RCX: 0x0000000002000000, RDX: 0xffffff80266fff88
    RSP: 0xffffff8102093e10, RBP: 0xffffff8102093e10, RSI: 0x000000000001dc03, RDI: 0xffffff80266fff88
    R8:  0x0000000000010000, R9:  0x0000000000000000, R10: 0x0000000000000000, R11: 0xffffff7f811d5bf7
    R12: 0x0000000000000000, R13: 0xffffff801ddb9f00, R14: 0xffffff8102093e68, R15: 0xffffff8000281be1
    RFL: 0x0000000000000246, RIP: 0xffffff80002c5530, CS:  0x0000000000000008, SS:  0x0000000000000010
    Error code: 0x0000000000000000
    
    Backtrace (CPU 8), Frame : Return Address
    0xffffff80fb0eceb0 : 0xffffff8000204d15 
    0xffffff80fb0ecfb0 : 0xffffff80002d0238 
    0xffffff80fb0ed0a0 : 0xffffff80002e48ff 
    0xffffff8102093e10 : 0xffffff800026fb8d 
    0xffffff8102093e50 : 0xffffff8000279d6f 
    0xffffff8102093e90 : 0xffffff8000281ba3 
    0xffffff8102093ed0 : 0xffffff80002c2daa 
    0xffffff8102093fa0 : 0xffffff80002e4536 
    
    BSD process name corresponding to current thread: EyeTV
    
    Mac OS version:
    10K540
    
    Kernel version:
    Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64
    System model name: MacPro4,1 (Mac-F221BEC8)
    
    Code:
    Interval Since Last Panic Report:  15513 sec
    Panics Since Last Report:          1
    
    Wed Jun 29 13:06:11 2011
    Machine-check capabilities (cpu 9) 0x0000000000001c09:
     family: 6 model: 26 stepping: 5 microcode: 17
     Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
     9 error-reporting banks
     threshold-based error status present
     extended corrected memory error handling present
    Machine-check status 0x0000000000000004:
     machine-check in progress
    MCA error-reporting registers:
     IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid
     IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid
     IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid
     IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid
     IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid
     IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid
     IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid
     IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid
     Package 1 logged:
     IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid
      Channel number:         15 (unknown)
      Memory Operation:       read
      Machine-specific error: Read ECC
      COR_ERR_CNT:            0
      Status bits:
       Processor context corrupt
       ADDR register valid
       MISC register valid
       Error enabled
       Uncorrected error
     IA32_MC8_ADDR(0x422): 0x000000000a79e7c0
     IA32_MC8_MISC(0x423): 0xe62e671200080280
      DIMM:     0
      Channel:  2
      Syndrome: 0xe62e6712
     Package 0 logged:
     IA32_MC8_STATUS(0x421): 0x0000000000000000 invalid
    panic(cpu 9 caller 0xffffff80002d0238): Machine Check thread:0xffffff801cd32ba8 at 0xffffff7f811dd58a, registers:
    CR0: 0x000000008001003b, CR2: 0x000000010046a000, CR3: 0x0000000000100000, CR4: 0x0000000000000660
    RAX: 0x0000000000000001, RBX: 0xffffff801cc9f000, RCX: 0x0000000000000001, RDX: 0xffffff7f811ef4c0
    RSP: 0xffffff8104c33d60, RBP: 0xffffff8104c33da0, RSI: 0xffffff801cc9f000, RDI: 0xffffff801cbbcd00
    R8:  0x0000000000000000, R9:  0x0000000000480000, R10: 0x0000000000000000, R11: 0xffffff80002d648b
    R12: 0xffffff7f811efc68, R13: 0xffffff801cbbc940, R14: 0x0000000000000001, R15: 0xffffff7f811edd18
    RFL: 0x0000000000000082, RIP: 0xffffff7f811dd58a, CS:  0x0000000000000008, SS:  0x0000000000000010
    Error code: 0x0000000000000000
    
    Backtrace (CPU 9), Frame : Return Address
    0xffffff80fb22beb0 : 0xffffff8000204d15 
    0xffffff80fb22bfb0 : 0xffffff80002d0238 
    0xffffff80fb22c0a0 : 0xffffff80002e48ff 
    0xffffff8104c33da0 : 0xffffff7f811d5567 
    0xffffff8104c33e80 : 0xffffff7f811d68a7 
    0xffffff8104c33f20 : 0xffffff800022e409 
    0xffffff8104c33f40 : 0xffffff80002083a4 
    0xffffff8104c33f80 : 0xffffff800027f532 
    0xffffff8104c33fa0 : 0xffffff80002c8527 
          Kernel Extensions in backtrace (with dependencies):
             com.apple.driver.AppleIntelCPUPowerManagement(142.6.0)@0xffffff7f811d4000->0xffffff7f811f7fff
    
    BSD process name corresponding to current thread: kernel_task
    
    Mac OS version:
    10K540
    
    Kernel version:
    Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64
    System model name: MacPro4,1 (Mac-F221BEC8)
    
    Code:
    Interval Since Last Panic Report:  72161 sec
    Panics Since Last Report:          2
    
    Fri Jul  1 21:42:06 2011
    Machine-check capabilities (cpu 0) 0x0000000000001c09:
     family: 6 model: 26 stepping: 5 microcode: 17
     Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
     9 error-reporting banks
     threshold-based error status present
     extended corrected memory error handling present
    Machine-check status 0x0000000000000004:
     machine-check in progress
    MCA error-reporting registers:
     IA32_MC0_STATUS(0x401): 0x0000000000000800 invalid
     IA32_MC1_STATUS(0x405): 0x0000000000000800 invalid
     IA32_MC2_STATUS(0x409): 0x0000000000000000 invalid
     IA32_MC3_STATUS(0x40d): 0x0000000000000000 invalid
     IA32_MC4_STATUS(0x411): 0x0000000000000000 invalid
     IA32_MC5_STATUS(0x415): 0x0000000000000000 invalid
     IA32_MC6_STATUS(0x419): 0x0000000000000000 invalid
     IA32_MC7_STATUS(0x41d): 0x0000000000000000 invalid
     Package 0 logged:
     IA32_MC8_STATUS(0x421): 0xbe0000000001009f valid
      Channel number:         15 (unknown)
      Memory Operation:       read
      Machine-specific error: Read ECC
      COR_ERR_CNT:            0
      Status bits:
       Processor context corrupt
       ADDR register valid
       MISC register valid
       Error enabled
       Uncorrected error
     IA32_MC8_ADDR(0x422): 0x000000000a79eb80
     IA32_MC8_MISC(0x423): 0xff44967100085840
      DIMM:     0
      Channel:  2
      Syndrome: 0xff449671
     Package 1 logged:
     IA32_MC8_STATUS(0x421): 0x0000000000000000 invalid
    panic(cpu 0 caller 0xffffff80002d0238): Machine Check thread:0xffffff8020871588 at 0x000000010004e3ed, registers:
    CR0: 0x0000000080010033, CR2: 0x0000000103f7e868, CR3: 0x0000000114fca000, CR4: 0x0000000000000660
    RAX: 0x0000000101275f58, RBX: 0x0000000100086e20, RCX: 0x000000010028084e, RDX: 0x000000000002d410
    RSP: 0x0000000100280718, RBP: 0x00000001002809d0, RSI: 0x000000000000000c, RDI: 0x0000000000000002
    R8:  0x000000009ef5d718, R9:  0x000000000000fffc, R10: 0x00000000ffffffff, R11: 0x0000000000000002
    R12: 0x000000010007a100, R13: 0x0000000000000000, R14: 0x000000000000bb3d, R15: 0x00000000000005a0
    RFL: 0x0000000000000202, RIP: 0x000000010004e3ed, CS:  0x000000000000002b, SS:  0x0000000000000023
    Error code: 0x0000000000000000
    
    Backtrace (CPU 0), Frame : Return Address
    0xffffff800083cd10 : 0xffffff8000204d15 
    0xffffff800083ce10 : 0xffffff80002d0238 
    0xffffff800083cf00 : 0xffffff80002e48ff 
    
    BSD process name corresponding to current thread: Elgato MPEG-2 De
    
    Mac OS version:
    10K540
    
    Kernel version:
    Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64
    System model name: MacPro4,1 (Mac-F221BEC8)
    
     
  2. FrankHahn macrumors 6502a

    Joined:
    May 17, 2011
    #2
    I looked through your KP reports. I can definitely say that it was the memory that caused all the KPs.
    You can see clearly that the reports said that "Machine-specific error: Read ECC".
     
  3. Cynicalone macrumors 68040

    Cynicalone

    Joined:
    Jul 9, 2008
    Location:
    Okie land
    #3
    It looks like the RAM.

    You can use the system disks that came with the Mac Pro to test the RAM. If you have Applecare and call them they will walk you threw testing the RAM.

    Basically boot the Mac Pro with two sticks installed and test them. Then shutdown install another pair keep testing them until you find the bad stick(s). Just using a process of elimination and the disk's that came with the computer you can troubleshoot it.
     
  4. IainH, Jul 1, 2011
    Last edited: Jul 1, 2011

    IainH thread starter macrumors member

    Joined:
    Mar 11, 2009
    #4
    Is there a way to narrow down which stick it is? I ran the TechTool software on the RAM but it reported no errors however I suspect the RAM needs to be pushed harder to find out which stick.

    Is it possible that its the actual memory controller thats messing up? They all seem to mention a "DIMM 0 Channel 2". Wonder if its possible I just managed to put the same stick into another DIMM in channel 2. Front 3 were put in the back, back 3 in the front.

    Edit: Seems this was answered in the post above, cheers =D You posted while I was writing my reply lol

    Can I use a standard snow leopard disc? Or does it have to be the system discs? From memory boot into diagnostics is hold D on boot with disc in isnt it?
     
  5. Cynicalone macrumors 68040

    Cynicalone

    Joined:
    Jul 9, 2008
    Location:
    Okie land
    #5
    It'll take some time but I can tell you what Apple would tell you to do.

    Find the grey system disks that came with the Mac Pro
    Insert the disk that has the hardware test on it
    Shut down the Mac Pro
    Remove all the RAM sticks
    Insert two stick into slots 1 & 2
    Reboot holding down "D"
    Run the quick test

    You can keep testing in pairs until you get an error.
     
  6. IainH thread starter macrumors member

    Joined:
    Mar 11, 2009
    #6
    What happens if none of them generate an error? DIMM0 suggests to me the first slot is the bad one but the stick that WAS in slot 1 on the 28th was moved into slot 7 on the 29th. Though I suppose its possible I have multiple faulties.

    Just worried nothing is going to show up =\

    I'll start testing in the morning.

    Whats the chance its the daughterboard itself that has the faults? Id love to think its an apple hardware fault (ram is 3rd party) and that they dont have the parts to fix it so I can get a replacement with a 2011...but I know that wont happen
     
  7. Cynicalone macrumors 68040

    Cynicalone

    Joined:
    Jul 9, 2008
    Location:
    Okie land
    #7
    Well if you test all the pairs and they pass you'll have to move on to testing the slots.

    The quick test takes about 3 minutes to run and it was able to catch the bad stick in my system.

    One of my 8 sticks died and kept causing kernel panics. Threw a process of elimination I was able to figure out which stick it was and remove it.
     
  8. DanielCoffey macrumors 65816

    DanielCoffey

    Joined:
    Nov 15, 2010
    Location:
    Edinburgh, UK
    #8
    If you do manage to narrow it down to one pair, you can then test each of that pair with one from a pair that passed.
     
  9. Cynicalone macrumors 68040

    Cynicalone

    Joined:
    Jul 9, 2008
    Location:
    Okie land
    #9
    That was what I was able to do on my system. And of course it turned out to be the eight and final stick I tested. :)
     
  10. IainH thread starter macrumors member

    Joined:
    Mar 11, 2009
    #10
    Heh at least Im only running 6 instead of 8 so slight advantage there =P
     
  11. IainH thread starter macrumors member

    Joined:
    Mar 11, 2009
    #11
    So out of interest I decided to run the hardware test with all 6 sticks in the machine (the quick test) but no errors came up.

    Does this mean when I get to doing it 2 at a time and rotate them through the slots its possible no errors will shoot up?
     
  12. Cynicalone macrumors 68040

    Cynicalone

    Joined:
    Jul 9, 2008
    Location:
    Okie land
    #12
    Try running the extended testing option. It will take a long time so best to do it over night.
     
  13. IainH thread starter macrumors member

    Joined:
    Mar 11, 2009
    #13
    Ive temporarily thrown in the factory 6x1gb sticks just incase its an issue with the daughterboard and not the ram itself.

    Out of interest is there a better memory testing tool? From what Ive read the AHT tool is pretty poor and something like memtest86 would be far better however in trying to find and get running an apple version of it Ive so far been unsucessful.
     
  14. philipma1957 macrumors 603

    philipma1957

    Joined:
    Apr 13, 2010
    Location:
    Howell, New Jersey
  15. Cynicalone macrumors 68040

    Cynicalone

    Joined:
    Jul 9, 2008
    Location:
    Okie land
    #15
  16. IainH thread starter macrumors member

    Joined:
    Mar 11, 2009
    #16
    Ive heard of it except apparently it wont test the RAM already in use by the OS since you run it while fully booted. Was hoping to get more a memtest style thing that can test all system ram, not just whats available.

    Ill check it out anyway, thanks.
     
  17. fr4c macrumors 65816

    fr4c

    Joined:
    Jul 27, 2007
    Location:
    Hamster wheel
    #17
    Getting memtest and running it in single user mode is probably the best thing to do.
     
  18. IainH thread starter macrumors member

    Joined:
    Mar 11, 2009
    #18
    Ive so far failed in trying to find and get working a mac version of memtest. I read it was meant to just be incorporated into it now but the boot discs arent loading it as they should.
     
  19. chrfr macrumors 603

    Joined:
    Jul 11, 2009
    #19
    Here's a long shot: look in System Profiler at the memory configuration. I've had ECC memory actually show as faulty there before. You'd see something other than "OK" under the status header.
     
  20. IainH thread starter macrumors member

    Joined:
    Mar 11, 2009
    #20
    Checked that. All showed OK
     
  21. Washac macrumors 68020

    Washac

    Joined:
    Jul 2, 2006
    #21
    Hi

    Do not know if this will be any help at all but I was getting regular Panics on my 09 Mac Pro, It did not matter what I did they just continued.

    I ran all checks on the sticks which were three 1GB sticks with Rember and got no errors.

    I then decided to take a chance and buy two 2GB sticks, I took note from the KP to which Channel and DIMM the error was pointing towards and put a sticker on that stick.

    I then put the two 2GB sticks into DIMM slots 1&2 and placed the two of the 1GB sticks including the one the error pointed towards into DIMM slots 3&4.

    From that day, no more panics :)

    No idea what the cause was, memory of some sort I believe.
     

Share This Page