Trouble initializing RAID 5 on RocketRaid 4322 / Crashing in RAID 0

Discussion in 'Mac Pro' started by brookifyd, Aug 8, 2009.

  1. brookifyd macrumors newbie

    Joined:
    Aug 8, 2009
    #1
    Hi,

    I recently purchased a HighPoint Technology RocketRaid 4322 (from Provantage) to use on my MacPro (2009 Nehalem 8 core, 16MB RAM) -- loaded into the 16x slot-- want to run an 8 bay SATA raid (Icy Dock / Rack enclosure – old fashioned “L” shaped SATA ports) off of it. Using HPT cables (mini-SAS to SATA adapter).

    Got it hooked up with an old 4-drive RAID (as Legacy) -- WD drives, seemed to work ok, but a little slow for what I want--

    Took those drives out to load 8 new (OEM) Seagate 7200.12 500GB drives (ST3500418AS) revision CC34

    Here are the issues / actions I’ve taken

    1) Can't initialize RAID 5 - I start a new array, it fails 5 min in-- I can then restart (but only in Background)-- highest I've gotten is 39% initialized. I feel like I shouldn't have to restart this every five minutes. That seems weird.

    2) Can initialize RAID 0, BUT, it's freezing up periodically (like every 15-20 min in FCP) -- I get that ominous spinning color wheel for 1-5 minutes. Sometimes it goes away-- sometimes I get impatient after 5-10 minutes and just shut everything down.

    I don’t have this problem when editing the same footage from my 1TB OWC firewire RAID 0 drive

    3) Tried AJA System test
    tried pulling apart the drives, loaded as individual volumes, and just testing each drive individually—on their own they seem to work fine (read write speeds btw 110-140 on avg), with one drive slightly slower.

    BUT then things got weird. After repeated testing, I started to get read speeds of ZERO and exponentially huge write speeds (into the thousands) – tried on several of the drives and same on all—ie it wasn’t just happening on one of the drives.

    Then the program crashed, computer froze—until I turned off the RAID enclosure power. This also happened when I ran an AJA test (or seven) on the build RAID 0 with all 8 drives.

    So then I went back to the 4-drive RAID I had in it before as a legacy RAID—it seemed to work better, but still more stilted in FCP than my firewire (slow to play clips). Also tried setting up a 4-disc RAID with the seagates – RAID 5 initialization failed within three minutes. Made a RAID 0 – more stable than the 8-disc config but still freezing up periodically.

    4) Downloaded “Seatools” but it’s DOS and I don’t really know what to do with it.

    5) Sent HPT an email last week – no response.

    6) Tried repairing the disk with DiskWarrior. No significant improvement on rebuild-- or specific errors found.

    7) Tried varying block size (read somewhere that 64 is more stable than 128), as well as the write back vs. write through setting)

    8) turned off NCQ


    From what I’ve read this could mean I have:

    a) a bad RocketRaid 4322 card (how can I be sure?)
    b) a bad drive in the bunch (if so, how can I tell which one?)
    c) a problem of compatibility btw the Seagate drives and the 4322 card (if so then why does everybody on BareFeats seem cool w/ this setup)
    d) strange OEM firmware on the drives (how could I check this?)

    Anyone have any advice?

    Thanks,
    Brooke
     
  2. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #2
    The RR4322 (all of them) are SAS cards, which are very picky with SATA drives. Enterprise drives are almost always needed, as they have different recovery timings than thier consumer counterparts. Unfortunately, the drive's you're using are consumer models, and are dropping out (that's what happens with consumer SATA on SAS). It's just unstable.

    Unfortunately, there's no tool available to adjust the timings, and you're going to have to get different disks to use on that card.

    Always consult the Compatiblility List for the card used before buying drives, as it can save this aggravation. ;) It even gets a tad more specific, as there's a firmware revision involved as well. But theoretically, if you have drives on the list that won't work, you can get the correct firmware. Not always the case though.
     
  3. AZREOSpecialist macrumors 68000

    AZREOSpecialist

    Joined:
    Mar 15, 2009
    #3
    I would have to concur with Nanofrog on this one. I have a 4320 with three internal drives in RAID 5 configuration. Zero problems initializing, zero problems formatting under Disk Utility. However, I am using Western Digital RE3 drives which are the enterprise version of the Caviar Blacks. If you use consumer drives in a RAID array, you will have dropout issues. The enterprise drive controllers are specifically designed to play well in RAID configurations.

     
  4. noushy macrumors regular

    Joined:
    Aug 27, 2008
    Location:
    Detroit, MI
    #4
    Cables, Cables, and cables

    I have that exact card sitting on the floor next to my Mac Pro for a month. It worked great with a SAS chassis, using quality external cables. No hokey highpoint minisas SFF8088 to Esata cables, they just are not reliable. Spend the money, buy a quality enclosure (Enhance, or ProAvio, same unit, 8 bay runs around $500-$600). Use 2 x 1m SFF8088-SFF8088 male cables and it will work. I moved the highpoint card to my server now, connected to the enhance rackmount SAS 8 bay chassis. 4 x 1.5 Seagate 7200.11, 4 x 450GB Seagate 15K cheetahs, with the 1.5TB in raid5, and the SAS drives in raid0. PM me with questions. I have been playing with SAS cards, drives, and chassis for the last 4 months (with help from Nano ;) ), and can give you a lot of insight. BTW running a 2009 Mac Pro 2.93 8 core.

    Peace,
    Noushy
     
  5. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #5
    I've never used the SFF-8088 to eSATA cables, but they wouldn't be the cause in the OP's case. The RR4xxx is a SAS card, and it won't work with consumer drives. WD could be done, after using the TLER utility to adjust the settings. Unfortunately, I've never located such a tool from Seagate, or other drive vendors.

    Might I ask what happened with the cable(s) you had issues with?
    They always seemed an interim solution to those with eSATA enclosures from a previous setup, who graduated to a full hardware RAID card from a 4 port eSATA card. Saves money on needing to get new enclosures to go with the card. Rather handy I've thought.
     
  6. noushy macrumors regular

    Joined:
    Aug 27, 2008
    Location:
    Detroit, MI
    #6
    Nano and cables

    Nano, I had nothing but bad luck when using those adapters on the Areca card. The only solution that allowed me to bring the internal drives to the external chassis was the custom sff8087 to sff8088 cables. But that is a different topic we can talk about whenever.

    As for the external cables, I had nothing but errors when using those highpoint cables. Multiple times I would boot and see no drives. When I moved the drives to a quality SAS chassis, and used the sff8088 male to male cables, they worked. Strange, but also something to do with the whole crossover cable thing maybe Nano?

    As for WD drives, I am having a bear of a time getting all 8 WD20EADS drives to work properly with the Areca card. First time, they all showed up, created a Raid5 with 7 drives and hotspare, and after initialization completed, the array went down. It also took down my internal WD1001FALS x 4 array at first, but when rebooted, it came back up. Now I tried a Raid6 using all 8 drives, which worked for about a week, and now two drives failed. When I rebooted, the drives showed up, but the array needs to be rebuilt. I also changed the TLER to 7 seconds, but obviously it didn't help. Thinking it needs a firmware change to work properly

    I have to agree with Nano, SAS controllers are very very picky with the drives, but also the chassis, the cables, the length of the cables, and any devices that get between the raid card and the drives (ie expanders, adapters).

    Peace,
    Noushy
     
  7. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #7
    NP. :)

    I'm thinking leakage/contact resistance (too many connections). This is assuming the total length was within specs.

    I would have suggested the TLER utility, but you've already tried it. You've also mentioned cable lengths, so I'll presume they're at a total length of 2.0m or less.

    The most recent firmware is v.1.46 for all the internal cards, and has been since Jan 23, 2009. So though a possiblity, I'll presume you've got the newest.

    What about the detailed settings (and remind me of the exact model)?
    I'm thinking NCQ is enabled currently, and needs to be DISABLED, but I need more info. (Causes stability issues on SAS cards, as SAS's native is TCQ, and SATA drives can't handle that). If it's not one thing, it's another. :p

    I've not had issues with the chassis, but it might be due to what I tend to select as well (good gear, not brand X). Though these days, it's getting rather hard to tell, with all the rebranding of the same stuff, with maybe a faceplate made differently (not just the brand label).

    Cables can be a real problem as well, and more than just the length (2.0m max for active signals, 1.0m for passive). Usually the length is the issue, but if going from internal to external ports, the SAS port adapters won't work, as there's too much leakage/contact resistance, causing instability. So the SFF-8087 to SFF-8088 cables you're using are the only viable solution with SATA drives.
     
  8. noushy macrumors regular

    Joined:
    Aug 27, 2008
    Location:
    Detroit, MI
    #8
    Nano

    You are right about the NCQ, I disabled it and rebooted. The array is verifying itself right now (checking). I have 8 WD20EADS drives, TLER set at 7s/7s, all the other raid card system settings are stock except NCQ now disabled. So far it is stable, but before it was stable for 2 days and then crashed bad. It is a raid6 on the Areca 1680ix card, with 4 internal drives on maxupgrades sleds (4x1TB WD1001FALS raid0). I have no other cards in the system except for the video card (nVidia Geforce 285 GTX). Let me know what else might help.

    As for the cables, one internal SFF8087 to SFF8088 from the 4th port (drives 13-16), and one external SFF8088 to SFF8088 (drives 17-20). An Enhance M8S ProAvio Chassis for the 8 2tb drives.

    As to the thread starter, try a different chassis, using minisas connectors, or one that uses the infiniband connectors, with a SFF8088 on the other end. My cables are 1.0m long, and I strongly recommend you keep the cables short.

    Peace,
    Noushy
     
  9. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #9
    You might want to change the TLER to 7,0, as that's the factory settings for the enterprise models. It usually works quite nicely. :)

    You'll have to test the stability between each change, and this does take some time. So be patient. Makes for ample snack breaks. :p

    I've always wondered if the external ports (SFF-8088) is switched off one of the other sets (either 1-4 or 13-16). (Given the fact most SAS controllers I'm aware of are either 4 or 8 port devices). If that's the case, you may need to switch the internal cable to another set of ports.

    I've not actually asked on that (Areca's support section). :eek: :confused: Have you by chance?

    Personally, I prefer the MiniSAS to the InfiniBand/MultiLane cables. A little more bandwidth, and the locking mechanism is nice. No screws to deal with. ;)
     
  10. noushy macrumors regular

    Joined:
    Aug 27, 2008
    Location:
    Detroit, MI
    #10
    Thanks Nano

    Unfortunately I didn't pay attention to the TLER_ON function until afterwards that you can set it to 7s/0s. I will have to setup the pc again and reprogram all 8 drives. Out of curiosity, I tried a raid 50 with two sets of 4 drives, and now getting time out errors on drive 17. I do have them in a row. The external connector is drives 17-20. I can move the internal down one to drives 9-12 from 13-16, to see if that helps.

    Noushy
     
  11. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #11
    I was sure the timings were still adjustable. :D

    Try swapping ports on the internal cable connector, and let me know how it goes. Good luck with it. :)
     
  12. noushy macrumors regular

    Joined:
    Aug 27, 2008
    Location:
    Detroit, MI
    #12
    Areca ports

    Nano,
    I moved it down one, and immediately on restart multiple time out errors. I had also adjusted the TLER settings to 7s/0s read/write. I played around a little with the internal connector, making sure the bend was not so sharp and the connector was seated firmly, and that stopped all the errors. Not sure if it has to do with the custom cable or not. However, previously the errors were with the external cable and the external port (17), and not the 13-16 ports. Now the first set is 9-12, and the second set is 17-20. I am letting time machine run, and will check the log later.

    Noushy
     
  13. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #13
    If the cable's don't lock into the SFF-8087 ports properly, you will have problems. It's happened to me on many occasions, as they're really stiff when new. I'm so used to it I guess, and figured you'd checked this, forgot to make mention of it at all. Sorry about that. Always the details. :eek: :p
     
  14. noushy macrumors regular

    Joined:
    Aug 27, 2008
    Location:
    Detroit, MI
    #14
    Nano

    Still getting time out errors, but now drive 19, which is the external SFF8088 to SFF8088 cable. Should I swap that cable out too? Could it be that it is getting bent too sharply from the back of the machine (below the desk) to the chassis? I will let time machine run and see how many errors. So far no errors on drives 9-12.

    Peace,
    Noushy
     
  15. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #15
    Try these first, one at a time, checking for errors in between (assuming cable connections are good).
    1. Pull the drive (#19) and reinsert (weak backplane connection in the enclosure)
    2. Swap it with another drive (checks to see if it's the drive, or the connections)
    3. Try rerouting the external cable (and do watch the bends in it). Usually, you don't want a bend any tighter than a 1" radius. Sometimes the reconnection helps, as it may not be seated in the connector as well as you might think. (They should be rather stiff. If not, the cable may be getting proper contact. In this case, as I assume you've only the one, you can try to reverse the ends (swap card side for enclosure side), as gravity can affect the weak connector).
     
  16. noushy macrumors regular

    Joined:
    Aug 27, 2008
    Location:
    Detroit, MI
    #16
    Time Out Errors

    Nano,
    Ok, after swapping the external cable and making sure that there is no tight bend on either cable, no errors. Time machine backed up 200gb, with no errors at all. I am wondering if all of my problems were based on cable issues. The first being not seated internally and too tight a bend. The second maybe a bad external minisas cable. The external cable had a tight bend due to the machine pushing against the desk, and by rerouting the cable, it now has a gentle curve with a brand new cable. I sure hope the other cable was not bad, just kinked. All I can say is minisas is extremely fussy. Thanks for all of your help and troubleshooting. It is sort of a voodoo art this sas/sata raid thing.

    Peace,
    Noushy
     
  17. nanofrog macrumors G4

    Joined:
    May 6, 2008
    #17
    Tight bends can cause the contact surfaces to not connect properly. Not uncommon, if you don't know to look for it. ;) Glad you got it sorted. :)

    You've got the physical installation up and running now, but keep an eye on it. The TLER adjustments made to the WD's should keep them stable, but with any new array, watch it (check the logs daily at least).
     
  18. MacUserPeggy macrumors newbie

    Joined:
    Mar 17, 2009
    #18
    Hello guy,

    There are my suggestion,

    (A) I think your RR4322 wasn’t broken.

    (B) From Q3, it seems that there is problem when you connect Seagate drivers directly to Motherboard but with WD, there is no such a problem. I have encountered similar problem, with RR3522, from my experience, when I replace Seagate with WD the problem is solved.

    (C) On Intel IOP white book, they suggest not to use Seagate HDD. Therefore, as RR4322 is equipped with 348. There might be incompatibility problem with Seagate HDD.

    (D) I suggest you change the firmware.
     

Share This Page