Recommendation for Thunderbolt RAID?

Discussion in 'Mac Pro' started by yadmonkey, Dec 3, 2015.

  1. yadmonkey macrumors 65816

    yadmonkey

    Joined:
    Aug 13, 2002
    Location:
    Western Spiral
    #1
    I need a reliable RAID that performs well. I have a Promise e-class RAID in service and it's been super reliable and a strong performer, but that's out of the question now that you can't shove a fiber channel card into a Mac Pro anymore and the Xserve is extinct.

    I also have a Promise Pegasus2 RAID and have found it very disappointing. It came with non enterprise class hard drives, has a very limited list of compatible hard drives, performs decently in terms of speed, but has had a huge amount of issues and Promise support has been abysmal. Basically it doesn't function if I don't shut it down and restart it daily. I don't trust it.

    So what are people using these days? This is for video work on a single Mac Pro. Should I consider a non-Apple server to open up more options or is there a Thunderbolt RAID worth investing in?
     
  2. AppleNewton macrumors 68000

    AppleNewton

    Joined:
    Apr 3, 2007
    Location:
    1 Finite Place
    #2
    wow surprised you had such a bad experience withe the Pegasus, I've had one unit running for atleast 2.5 years almost 3 hooked up to an iMac, replaced the drives with Enterprise drives and no issues what soever. never had to reboot, shutdown or anything with this unit.


    Theres nothing else I could recommend, Drobo is a no go (terrible) and the WD Raid drives aren't any better.

    I know there isn't an answer precisely to your question but figured I'd chime in that the issues with the pegasus is surprising.
     
  3. zesta macrumors member

    Joined:
    Jan 31, 2008
    #3
    I love my Areca ARC-8050T2 units. Pretty fast with the right drives, and rock solid.

    Steer clear of Drobo. Horrible designs, terrible reliability, and questionable support.

    You could also get a Thunderbolt to Fibre Channel adapter if you want to use your existing FC array.
     
  4. yadmonkey thread starter macrumors 65816

    yadmonkey

    Joined:
    Aug 13, 2002
    Location:
    Western Spiral
    #4
    I was surprised about the Pegasus as well. Even the best company can put out a lemon here and there, but really it's their support that has me wary of investing more in their products. Had they done the right thing and replaced the faulty RAID, it would be a different story.

    Thanks for your feedback.
     
  5. yadmonkey thread starter macrumors 65816

    yadmonkey

    Joined:
    Aug 13, 2002
    Location:
    Western Spiral
    #5
    Hadn't heard of Areca before but I'll check them out. Also, I had no idea such a thing as a TB->Fiber adapter existed, but that's kind of exciting. Could open a lot of possibilities. Thanks!
     
  6. joema2 macrumors 65816

    joema2

    Joined:
    Sep 3, 2013
    #6
    Sorry to hear this; I have had great results from my Pegasus R4. You might consider the OWC Thunderbay 4: http://eshop.macsales.com/shop/Thunderbolt/External-Drive/OWC/ThunderBay-4

    There's an extensive review of it here: http://macperformanceguide.com/Reviews-OWC-Thunderbay.html

    It does not require HDDs with specific firmware versions.
     
  7. chrfr macrumors 603

    Joined:
    Jul 11, 2009
    #7
    I have a Pegasus2 at work that worked perfectly for a while but now has a problem where it randomly will hang up, and can only be resolved by pulling the power to the RAID. When it happens the computer also stops responding. It's very frustrating.
     
  8. sarge macrumors 6502a

    sarge

    Joined:
    Jul 20, 2003
    Location:
    Brooklyn
    #8
    I have used CalDigit for a number of years and am very happy w/them - happy enough to have purchased three 3 bay Arrays.

    http://www.caldigit.com/T3/

    But they also make a 4 bay model if that's not enough for you (I know the pegasus does like 6-8 or something like that).

    http://www.caldigit.com/T4/index_TBT2.asp

    I have 2 for my home setup and one for the office. I had them in a RAID 1 configuration and had no problem breaking the RAIDs and shuttling the drives between home and the office when needed. Out of concern for the possible problems presented by constantly breaking the RAID I just decided to configure most of the bays as JBODs. I've been really happy with their construction, performance, and customer support.
     
  9. shaunp macrumors 65816

    Joined:
    Nov 5, 2010
    #9
    I was just about to say Pegasus until I read your post properly. I'm surprised.

    I would try G-Technology in that case. Not used them personally, but they are made by Hitachi who also make high-end enterprise-class storage. Otherwise there's OWC or LaCie but I wouldn't count either of those as enterprise.

    The other route you could take is to get an TB to FC adaptor from Pegasus, and try one of the FC arrays. You'd only have a single path to the data, but if you need more than that I'd ditch the Mac altogether and get a PC based workstation with two FC HBA's and then choose whatever storage you want.
     
  10. mcnallym macrumors 6502a

    Joined:
    Oct 28, 2008
    #10
    http://www.netstor.com.tw/_03/03_02.php?MTIw#

    for 2.5" drives

    http://www.netstor.com.tw/_03/03_02.php?MTA4

    for 3.5" drives

    uses TB2 for connection to the Mac Pro, would need to install an Areca RAID Card which are pretty reliable and long history of OSX support. Just use Drives or your choice, wether SSD or SAS or SATA Enterprise Disks. Has room for 16 disks

    Also provide additional PCI-E slots if you need to add any in. Cards need to be TB aware but depending upon if drivers available for your FC HBA could potentially add those in as well. Options to install Video Editing cards as well
     
  11. VirtualRain, Dec 5, 2015
    Last edited: Dec 6, 2015

    VirtualRain macrumors 603

    VirtualRain

    Joined:
    Aug 1, 2008
    Location:
    Vancouver, BC
    #11

    I use an OWC Thunderbay connected to my Mac Mini. It's been running a RAID0 array of consumer Seagate 5TB drives (20TB total) 24/7 for about 18 months. I only restart the system when required to apply an update.

    I keep important data on the array backed up to other external drives connected via USB 3 and duplicate of really important data offsite.

    Now you didn't mention what RAID level you're using, but RAID5 effectively died with 2TB drives... so I think the concept of paying more for special RAID drives are silly. If a drive fails in a 20TB RAID5 array, the chances of it rebuilding are slim and even if it does, it's going to take such a long time that your degraded performance during that period is probably likely to make you want to throw it out the window. In a cheap consumer RAID0 array, if a drive fails, you can be back up and running in much shorter time by restoring from a USB3 backup.
     
  12. chrfr macrumors 603

    Joined:
    Jul 11, 2009
    #12
    The purposes of RAID 0 and 5 arrays are so different that one cannot be replaced by the other if they are being used appropriately. RAID 5 certainly has significant issues but taking down a server to restore an array from a backup is seldom a viable option.
     
  13. AidenShaw macrumors P6

    AidenShaw

    Joined:
    Feb 8, 2003
    Location:
    The Peninsula
    #13
    Which is why I always use RAID-60 for larger arrays.
     
  14. VirtualRain macrumors 603

    VirtualRain

    Joined:
    Aug 1, 2008
    Location:
    Vancouver, BC
    #14
    True, but it could be argued that a large RAID5 rebuild is likely going to result in server downtime or crippled performance for an extended period of time. If service continuity really is the top priority, there are much better RAID solutions than RAID5. RAID5 really is dead.
     
  15. joema2 macrumors 65816

    joema2

    Joined:
    Sep 3, 2013
    #15
    Why do you say this?
     
  16. AidenShaw macrumors P6

    AidenShaw

    Joined:
    Feb 8, 2003
    Location:
    The Peninsula
    #16
    Certainly true for software RAID.

    Higher end hardware RAID controllers (like mine with 8GiB of write-back cache RAM) not only do a much better job, but usually have a knob or slider to choose from "rebuild ASAP and screw the apps" to "preserve app performance, and rebuild during idle times".
     
  17. SDAVE macrumors 68040

    SDAVE

    Joined:
    Jun 16, 2007
    Location:
    Nowhere
    #17
    If you can do Gigabit (about 110MB/sec max) then you can do a shared SMB network on something like a Dell server
     
  18. AidenShaw macrumors P6

    AidenShaw

    Joined:
    Feb 8, 2003
    Location:
    The Peninsula
    #18
    The specs for a typical 4TB NAS drive are:

    Nonrecoverable Read Errors per Bits
    Read, Max 1 per 10E15
    4 TB is 3.2 e+13 bits.
    Build a 20TB (usable) array out of six of them.
    To rebuild, you have to read 1.6 e+14 bits.
    That's "scary close" to the expected, normal error rate.
     
  19. joema2 macrumors 65816

    joema2

    Joined:
    Sep 3, 2013
    #19
    I believe that bit error rate is a *minimum* spec, not a typical number. E.g, many HDD manufacturers list a spec of 1 failure per 10^14 bits reads. 10^14 bits = 12.5 terabytes. So by that spec you might expect a failure on average every 12.5TB. We know from actual experience this does not happen.

    This was investigated some years ago and the conclusions are still valid. Namely the HDD bit error rate spec is not a typical failure rate it is a "worst case" number. HDDs typically do much better than that.

    Empirical Measurements of Disk Failure Rates and Error Rates (2005, Jim Gray, et al):
    http://research.microsoft.com/apps/pubs/default.aspx?id=64599

    Unfortunately writer Robin Harris at ZDNet wrote several "gloom and doom" articles about how unreliable RAID 5 is.

    "Why RAID 5 Stops Working in 2009": http://www.zdnet.com/article/why-raid-5-stops-working-in-2009/
    "Has RAID 5 Stopped Working?" (2013 article): http://www.zdnet.com/article/has-raid5-stopped-working/

    He apparently did not understand what bit error rate means nor did he examine the prior research done in this area that repudiates his claim.

    Other technically astute writers have noted the fallacious reasoning:
    "Why RAID-5 Stops Working in 2009 – Not Necessarily":
    https://www.high-rely.com/blog/why-raid-5-stops-working-in-2009-not/

    "Using RAID-5 Means the Sky is Falling":
    https://www.high-rely.com/blog/using-raid5-means-the-sky-is-falling/

    I have forced a failure on my 8TB Pegasus R4 RAID5 array and it took a few hours to rebuild, during which I/O performance was slower. Actual *system* performance may not be noticeably degraded, since it usually doesn't require peak I/O performance.

    By contrast if one HDD of a four-drive 8TB RAID 0 array fails, your system performance is *zero* for roughly 10 hours while you reload from backups.
     
  20. AidenShaw, Dec 6, 2015
    Last edited: Dec 6, 2015

    AidenShaw macrumors P6

    AidenShaw

    Joined:
    Feb 8, 2003
    Location:
    The Peninsula
    #20
    In other words, "typically you'd be able to rebuild your array, but even with drives operating within specs you could lose all of your data".

    And yes, I've had 16TB (five 4TB in RAID-5) collapse during rebuilds because of an ECC error reading sectors from one of the "survivors" during the rebuild. It's not an "insignificant statistical risk" - it's real and it happens.

    I use RAID-60 because it gives me additional protection in two dimensions over RAID-5, as well as much better typical and worst case performance.

    I like the "belt and suspenders" approach to data protection.

    Another thing that people citing "typical" failure rates might not realize is that drive failures are not independent. One of the best predictors of drive failure is that drives of nearby (where "nearby" is measured in weeks) serial numbers have failed.

    The reason should be obvious - there's a bad batch of bearings, or impurities in the rust coating, or contaminants from manufacturing, or.... These things often sort themselves out before any failures are noticed - but not before thousands of suspect drives go into the supply chain.

    And what happens when you buy a new 25-drive array (or a 4-drive array) - you get drives with serial numbers in a small range.

    If one drive in your RAID fails, the odds shoot up that other drives may fail - since if a manufacturing fault contributed to the first failure, there's a chance that other drives will fail for the same reason because they were built around the same time and may suffer from the same manufacturing flaw.

    After one failure - the word "typical" no longer applies if serial numbers are close.

    Some enterprise storage systems (I'm talking about arrays with hundreds or thousands of drives costing from large fractions of a million dollars to many millions) assign a "trustworthiness" property to drives. If they notice drive failures in a small range of serial numbers, they'll reduce the trustworthiness of drives with similar serial numbers, and move sensitive data off non-trustworthy drives. (Use trustworthy drives in RAID-0 or RAID-5, demote non-trustworthy to RAID-6 or hot spares or schedule for replacement before failure.)
     
  21. joema2 macrumors 65816

    joema2

    Joined:
    Sep 3, 2013
    #21
    The statement VirtualRain made which I questioned was "If a drive fails in a 20TB RAID5 array, the chances of it rebuilding are slim"

    Not that a risk exists, but that the risk is so great the chance of a successful rebuild is *slim*. What is the basis for that?

    If the chance was slim, that would mean you could replace a HDD an a group of 20TB RAID-5 arrays, and many of them would fail to rebuild. You would see failure after failure. It would mean RAID-5 basically doesn't work at that size.

    Yes in a prior IT job we had 100% failures on a large batch of drives. Every single one failed, and we had to fly out to the mfg and investigate.

    However with my 8TB Pegasus R4, I have sync'd it *many* times without a single error. This of course reads every bit from every drive in the array. See data in this graph I produced from some of these tests:

    https://joema.smugmug.com/Computers/Pegasus-R4-RAID5-Sync-Rate/n-p2Nn4p/i-pdz6p2j/A

    It's important to understand the ZDNet articles by Robin Harris are based on an incorrect understanding of bit error rates. RAID-5 obviously did not stop working in 2009 as he predicted, and unfortunately by his second article in 2013, he still had not grasped his mistake. I know of no basis to confidently state that RAID-5 "has stopped working" or that the chances of rebuilding a 20TB array "are slim".
     
  22. VirtualRain macrumors 603

    VirtualRain

    Joined:
    Aug 1, 2008
    Location:
    Vancouver, BC
    #22
    I'm just going on research I've come across. If it's all wrong them I'm wrong and you can ignore what I said. However, even if I'm completely off the mark, I'd still suggest people think carefully before using RAID5.

    People with any storage project should do their own research and understand the risks and other factors affecting data loss and productivity impact and adopt a solution that meets or exceeds their acceptable risk level, system availability needs, performance requirements and budget.
     
  23. AidenShaw, Dec 7, 2015
    Last edited: Dec 7, 2015

    AidenShaw macrumors P6

    AidenShaw

    Joined:
    Feb 8, 2003
    Location:
    The Peninsula
    #23
    Think is the important word here. We're not "over the edge" yet with bit-error-rate issues and RAID-5 - but it is something that can't be ignored and can easily be an issue with poor storage design.

    Don't make a 12 drive RAID-5 volume - you'll regret it. Don't make a 6 drive RAID-5 volume, too high of a chance that you'll regret it. I configure 12 drives as a two-way RAID-60 volume, and add a hot spare or two. Most of my storage shelves have 25 drives, so it's four-way RAID-60 with a hot spare. For high capacity "near-line" storage - the standard building block is a 72TB shelf (twelve 6TB drives) in two-way RAID-60 for 48TB usable (with global hot spares).

    For those of us for whom the line "you'll probably be able to rebuild your array" is just not good enough - "RAID-5 is dead".

    ps: I've just configured a Spark cluster, and the local scratch space on each node is RAID-5 with hot spares - RAID-5 is good enough for temp scratch space.
     
  24. FireWire2 macrumors 6502

    FireWire2

    Joined:
    Oct 12, 2008
    #24
    One factor, that most would missed, ECR - Error Correction Recovery - that build in each and every HDD. I have built 100's raid5 volume some up to 24TB / 32TB. It's a just pain when rebuild. None roll-out and die on my jet
    But yes, agree with joema2 if you do not know how to maintain your raid... RAID6 will failed on you.

    Lot of users mis-inform with RAID5/6 volume, they NEVER bother to perform volume check, that where are things start

    So whether RAID5 or 6 check the volume :) it would be lot less headache if you will

    A 12x drive raid60 is an overkill, not efficient and with SAS system, there is noway you can expand the existing RAID.
     
  25. AidenShaw, Dec 8, 2015
    Last edited: Dec 8, 2015

    AidenShaw macrumors P6

    AidenShaw

    Joined:
    Feb 8, 2003
    Location:
    The Peninsula
    #25
    Your opinion.

    In my case, the cost of downtime or loss of data is very great. Buying 72TB raw to get a usable, reliable 48TB is not inefficient - it's smart. It might be overkill if your array has 128 GB SSDs, but with 6 TB spinners the bit error rate issues are not theoretical.

    A twelve drive RAID-5 set would be ludicrous under any circumstances - although you'd get 66TB usable.

    Twelve drive RAID-50 would give 60TB, but a second error during rebuild leaves you with 0TB.

    Twelve drive RAID-60 gives you 48TB, and the ability to withstand the failure of two drives (perhaps even four drives failing).


    That's funny, because yesterday I added two drives to a RAID-60 set - it not only expanded the array, but it did it online while the system was in use. You should look into better SAS controllers.

    This afternoon (often multi-TB online reconfiguration operations take a while) my logged in users noticed that there was a lot more free space on the work drive. Completely online - no reboots, no dismounts, no logouts.

    One minute, the drive shows 100GB free. A few minutes later, the same command shows 2.5TB free. It just works. ;)


    We're definitely on the same page here. A huge percentage of "RAID failure" events are "the humans didn't notice that one drive had failed, and lost the array when the second drive failed".

    My controllers perform a background full volume scan (read and verify every data and parity chunk) every 72 to 168 hours. Not only that, but they also continuously monitor S.M.A.R.T. data and if a drive enters "failure predicted" mode the controller will:
    • copy the "soon to fail" drive to a hot spare
      • Note that this is not a "rebuild" -- it's a simple sequential copy of the suspect drive to the hot spare. If there's an error on a chunk, then that one chunk will be rebuilt (if necessary) from the other drives, and the sequential copy proceeds.
    • put the hot spare into the array as a full member, removing the suspect drive
    • put the suspect drive on the "unusable" list
    • send an email requesting service, with serial numbers, failure codes, slot numbers, etc
    I forward the email to HP, and a new drive arrives in a day or two. I have the tech remove the drive with the yellow light and insert the new drive. Back to fully redundant and spared service (actually, at no point in the process was redundancy lost).

    Depending on the array, either the event is over - or the controller will copy the data from the "hot spare" to the newly inserted drive.
    • If the array is homogeneous, then it's over.
    • A global hot spare can be used for any failing drive - so a 1.2 TB spinner might be used to replace a failing 200GB SSD. You'd want that array to revert to the new 200GB SSD and put the 1.2 TB spinner back as hot spare.
    • Some setups have different bandwidth subscription models, in which case you might want to restore the original topology. (I have a number of controllers where the first 24 drives have direct SAS channels (6/12 Gbps per disk, 144/288 Gbps aggregate), and the next 175 drives share a 4 lane daisy chain with 24/48Gbps. If you've put an array in the first group for performance, you probably don't want it using a disk on the daisy chain.)
     

Share This Page