Recommendation for Thunderbolt RAID?

yadmonkey · Dec 3, 2015

I need a reliable RAID that performs well. I have a Promise e-class RAID in service and it's been super reliable and a strong performer, but that's out of the question now that you can't shove a fiber channel card into a Mac Pro anymore and the Xserve is extinct.

I also have a Promise Pegasus2 RAID and have found it very disappointing. It came with non enterprise class hard drives, has a very limited list of compatible hard drives, performs decently in terms of speed, but has had a huge amount of issues and Promise support has been abysmal. Basically it doesn't function if I don't shut it down and restart it daily. I don't trust it.

So what are people using these days? This is for video work on a single Mac Pro. Should I consider a non-Apple server to open up more options or is there a Thunderbolt RAID worth investing in?

AppleNewton · Dec 3, 2015

wow surprised you had such a bad experience withe the Pegasus, I've had one unit running for atleast 2.5 years almost 3 hooked up to an iMac, replaced the drives with Enterprise drives and no issues what soever. never had to reboot, shutdown or anything with this unit.

Theres nothing else I could recommend, Drobo is a no go (terrible) and the WD Raid drives aren't any better.

I know there isn't an answer precisely to your question but figured I'd chime in that the issues with the pegasus is surprising.

zesta · Dec 3, 2015

I love my Areca ARC-8050T2 units. Pretty fast with the right drives, and rock solid.

Steer clear of Drobo. Horrible designs, terrible reliability, and questionable support.

You could also get a Thunderbolt to Fibre Channel adapter if you want to use your existing FC array.

yadmonkey · Dec 4, 2015

AppleNewton said:
wow surprised you had such a bad experience withe the Pegasus, I've had one unit running for atleast 2.5 years almost 3 hooked up to an iMac, replaced the drives with Enterprise drives and no issues what soever. never had to reboot, shutdown or anything with this unit.

Theres nothing else I could recommend, Drobo is a no go (terrible) and the WD Raid drives aren't any better.

I know there isn't an answer precisely to your question but figured I'd chime in that the issues with the pegasus is surprising.

I was surprised about the Pegasus as well. Even the best company can put out a lemon here and there, but really it's their support that has me wary of investing more in their products. Had they done the right thing and replaced the faulty RAID, it would be a different story.

Thanks for your feedback.

yadmonkey · Dec 4, 2015

zesta said:
I love my Areca ARC-8050T2 units. Pretty fast with the right drives, and rock solid.

Steer clear of Drobo. Horrible designs, terrible reliability, and questionable support.

You could also get a Thunderbolt to Fibre Channel adapter if you want to use your existing FC array.

Hadn't heard of Areca before but I'll check them out. Also, I had no idea such a thing as a TB->Fiber adapter existed, but that's kind of exciting. Could open a lot of possibilities. Thanks!

joema2 · Dec 4, 2015

yadmonkey said:
I need a reliable RAID that performs well...have a Promise Pegasus2 RAID and have found it very disappointing....performs decently in terms of speed, but has had a huge amount of issues and Promise support has been abysmal. Basically it doesn't function if I don't shut it down and restart it daily....

Sorry to hear this; I have had great results from my Pegasus R4. You might consider the OWC Thunderbay 4: http://eshop.macsales.com/shop/Thunderbolt/External-Drive/OWC/ThunderBay-4

There's an extensive review of it here: http://macperformanceguide.com/Reviews-OWC-Thunderbay.html

It does not require HDDs with specific firmware versions.

chrfr · Dec 4, 2015

yadmonkey said:
I was surprised about the Pegasus as well. Even the best company can put out a lemon here and there, but really it's their support that has me wary of investing more in their products. Had they done the right thing and replaced the faulty RAID, it would be a different story.

Thanks for your feedback.

I have a Pegasus2 at work that worked perfectly for a while but now has a problem where it randomly will hang up, and can only be resolved by pulling the power to the RAID. When it happens the computer also stops responding. It's very frustrating.

sarge · Dec 4, 2015

I have used CalDigit for a number of years and am very happy w/them - happy enough to have purchased three 3 bay Arrays.

http://www.caldigit.com/T3/

But they also make a 4 bay model if that's not enough for you (I know the pegasus does like 6-8 or something like that).

http://www.caldigit.com/T4/index_TBT2.asp

I have 2 for my home setup and one for the office. I had them in a RAID 1 configuration and had no problem breaking the RAIDs and shuttling the drives between home and the office when needed. Out of concern for the possible problems presented by constantly breaking the RAID I just decided to configure most of the bays as JBODs. I've been really happy with their construction, performance, and customer support.

shaunp · Dec 4, 2015

I was just about to say Pegasus until I read your post properly. I'm surprised.

I would try G-Technology in that case. Not used them personally, but they are made by Hitachi who also make high-end enterprise-class storage. Otherwise there's OWC or LaCie but I wouldn't count either of those as enterprise.

The other route you could take is to get an TB to FC adaptor from Pegasus, and try one of the FC arrays. You'd only have a single path to the data, but if you need more than that I'd ditch the Mac altogether and get a PC based workstation with two FC HBA's and then choose whatever storage you want.

mcnallym · Dec 5, 2015

http://www.netstor.com.tw/_03/03_02.php?MTIw#

for 2.5" drives

http://www.netstor.com.tw/_03/03_02.php?MTA4

for 3.5" drives

uses TB2 for connection to the Mac Pro, would need to install an Areca RAID Card which are pretty reliable and long history of OSX support. Just use Drives or your choice, wether SSD or SAS or SATA Enterprise Disks. Has room for 16 disks

Also provide additional PCI-E slots if you need to add any in. Cards need to be TB aware but depending upon if drivers available for your FC HBA could potentially add those in as well. Options to install Video Editing cards as well

VirtualRain · Dec 5, 2015

yadmonkey said:
I need a reliable RAID that performs well. I have a Promise e-class RAID in service and it's been super reliable and a strong performer, but that's out of the question now that you can't shove a fiber channel card into a Mac Pro anymore and the Xserve is extinct.

I also have a Promise Pegasus2 RAID and have found it very disappointing. It came with non enterprise class hard drives, has a very limited list of compatible hard drives, performs decently in terms of speed, but has had a huge amount of issues and Promise support has been abysmal. Basically it doesn't function if I don't shut it down and restart it daily. I don't trust it.

So what are people using these days? This is for video work on a single Mac Pro. Should I consider a non-Apple server to open up more options or is there a Thunderbolt RAID worth investing in?

I use an OWC Thunderbay connected to my Mac Mini. It's been running a RAID0 array of consumer Seagate 5TB drives (20TB total) 24/7 for about 18 months. I only restart the system when required to apply an update.

I keep important data on the array backed up to other external drives connected via USB 3 and duplicate of really important data offsite.

Now you didn't mention what RAID level you're using, but RAID5 effectively died with 2TB drives... so I think the concept of paying more for special RAID drives are silly. If a drive fails in a 20TB RAID5 array, the chances of it rebuilding are slim and even if it does, it's going to take such a long time that your degraded performance during that period is probably likely to make you want to throw it out the window. In a cheap consumer RAID0 array, if a drive fails, you can be back up and running in much shorter time by restoring from a USB3 backup.

chrfr · Dec 6, 2015

VirtualRain said:
If a drive fails in a 20TB RAID5 array, the chances of it rebuilding are slim and even if it does, it's going to take such a long time that your degraded performance during that period is probably likely to make you want to throw it out the window. In a cheap consumer RAID0 array, if a drive fails, you can be back up and running in much shorter time by restoring from a USB3 backup.

The purposes of RAID 0 and 5 arrays are so different that one cannot be replaced by the other if they are being used appropriately. RAID 5 certainly has significant issues but taking down a server to restore an array from a backup is seldom a viable option.

AidenShaw · Dec 6, 2015

VirtualRain said:
Now you didn't mention what RAID level you're using, but RAID5 effectively died with 2TB drives...

Which is why I always use RAID-60 for larger arrays.

VirtualRain · Dec 6, 2015

chrfr said:
The purposes of RAID 0 and 5 arrays are so different that one cannot be replaced by the other if they are being used appropriately. RAID 5 certainly has significant issues but taking down a server to restore an array from a backup is seldom a viable option.

True, but it could be argued that a large RAID5 rebuild is likely going to result in server downtime or crippled performance for an extended period of time. If service continuity really is the top priority, there are much better RAID solutions than RAID5. RAID5 really is dead.

joema2 · Dec 6, 2015

VirtualRain said:
....If a drive fails in a 20TB RAID5 array, the chances of it rebuilding are slim...

Why do you say this?

AidenShaw · Dec 6, 2015

VirtualRain said:
True, but it could be argued that a large RAID5 rebuild is likely going to result in server downtime or crippled performance for an extended period of time. If service continuity really is the top priority, there are much better RAID solutions than RAID5. RAID5 really is dead.

Certainly true for software RAID.

Higher end hardware RAID controllers (like mine with 8GiB of write-back cache RAM) not only do a much better job, but usually have a knob or slider to choose from "rebuild ASAP and screw the apps" to "preserve app performance, and rebuild during idle times".

SDAVE · Dec 6, 2015

If you can do Gigabit (about 110MB/sec max) then you can do a shared SMB network on something like a Dell server

AidenShaw · Dec 6, 2015

joema2 said:
Why do you say this?

The specs for a typical 4TB NAS drive are:

Nonrecoverable Read Errors per Bits
Read, Max 1 per 10E15

4 TB is 3.2 e+13 bits.
Build a 20TB (usable) array out of six of them.
To rebuild, you have to read 1.6 e+14 bits.
That's "scary close" to the expected, normal error rate.

joema2 · Dec 6, 2015

AidenShaw said:
The specs for a typical 4TB NAS drive are:

Nonrecoverable Read Errors per Bits
Read, Max 1 per 10E15

4 TB is 3.2 e+13 bits.
Build a 20TB (usable) array out of six of them.
To rebuild, you have to read 1.6 e+14 bits.
That's "scary close" to the expected, normal error rate.

I believe that bit error rate is a *minimum* spec, not a typical number. E.g, many HDD manufacturers list a spec of 1 failure per 10^14 bits reads. 10^14 bits = 12.5 terabytes. So by that spec you might expect a failure on average every 12.5TB. We know from actual experience this does not happen.

This was investigated some years ago and the conclusions are still valid. Namely the HDD bit error rate spec is not a typical failure rate it is a "worst case" number. HDDs typically do much better than that.

Empirical Measurements of Disk Failure Rates and Error Rates (2005, Jim Gray, et al):
http://research.microsoft.com/apps/pubs/default.aspx?id=64599

Unfortunately writer Robin Harris at ZDNet wrote several "gloom and doom" articles about how unreliable RAID 5 is.

"Why RAID 5 Stops Working in 2009": http://www.zdnet.com/article/why-raid-5-stops-working-in-2009/
"Has RAID 5 Stopped Working?" (2013 article): http://www.zdnet.com/article/has-raid5-stopped-working/

He apparently did not understand what bit error rate means nor did he examine the prior research done in this area that repudiates his claim.

Other technically astute writers have noted the fallacious reasoning:
"Why RAID-5 Stops Working in 2009 – Not Necessarily":
https://www.high-rely.com/blog/why-raid-5-stops-working-in-2009-not/

"Using RAID-5 Means the Sky is Falling":
https://www.high-rely.com/blog/using-raid5-means-the-sky-is-falling/

I have forced a failure on my 8TB Pegasus R4 RAID5 array and it took a few hours to rebuild, during which I/O performance was slower. Actual *system* performance may not be noticeably degraded, since it usually doesn't require peak I/O performance.

By contrast if one HDD of a four-drive 8TB RAID 0 array fails, your system performance is *zero* for roughly 10 hours while you reload from backups.

AidenShaw · Dec 6, 2015

joema2 said:
I believe that bit error rate is a *minimum* spec, not a typical number.

In other words, "typically you'd be able to rebuild your array, but even with drives operating within specs you could lose all of your data".

And yes, I've had 16TB (five 4TB in RAID-5) collapse during rebuilds because of an ECC error reading sectors from one of the "survivors" during the rebuild. It's not an "insignificant statistical risk" - it's real and it happens.

I use RAID-60 because it gives me additional protection in two dimensions over RAID-5, as well as much better typical and worst case performance.

I like the "belt and suspenders" approach to data protection.

Another thing that people citing "typical" failure rates might not realize is that drive failures are not independent. One of the best predictors of drive failure is that drives of nearby (where "nearby" is measured in weeks) serial numbers have failed.

The reason should be obvious - there's a bad batch of bearings, or impurities in the rust coating, or contaminants from manufacturing, or.... These things often sort themselves out before any failures are noticed - but not before thousands of suspect drives go into the supply chain.

And what happens when you buy a new 25-drive array (or a 4-drive array) - you get drives with serial numbers in a small range.

If one drive in your RAID fails, the odds shoot up that other drives may fail - since if a manufacturing fault contributed to the first failure, there's a chance that other drives will fail for the same reason because they were built around the same time and may suffer from the same manufacturing flaw.

After one failure - the word "typical" no longer applies if serial numbers are close.

Some enterprise storage systems (I'm talking about arrays with hundreds or thousands of drives costing from large fractions of a million dollars to many millions) assign a "trustworthiness" property to drives. If they notice drive failures in a small range of serial numbers, they'll reduce the trustworthiness of drives with similar serial numbers, and move sensitive data off non-trustworthy drives. (Use trustworthy drives in RAID-0 or RAID-5, demote non-trustworthy to RAID-6 or hot spares or schedule for replacement before failure.)

joema2 · Dec 7, 2015

AidenShaw said:
In other words, "typically you'd be able to rebuild your array, but even with drives operating within specs you could lose all of your data".

And yes, I've had 16TB (five 4TB in RAID-5) collapse during rebuilds because of an ECC error reading sectors from one of the "survivors" during the rebuild. It's not an "insignificant statistical risk" - it's real and it happens.

The statement VirtualRain made which I questioned was "If a drive fails in a 20TB RAID5 array, the chances of it rebuilding are slim"

Not that a risk exists, but that the risk is so great the chance of a successful rebuild is *slim*. What is the basis for that?

If the chance was slim, that would mean you could replace a HDD an a group of 20TB RAID-5 arrays, and many of them would fail to rebuild. You would see failure after failure. It would mean RAID-5 basically doesn't work at that size.

AidenShaw said:
Another thing that people citing "typical" failure rates might not realize is that drive failures are not independent. One of the best predictors of drive failure is that drives of nearby (where "nearby" is measured in weeks) serial numbers have failed.

Yes in a prior IT job we had 100% failures on a large batch of drives. Every single one failed, and we had to fly out to the mfg and investigate.

However with my 8TB Pegasus R4, I have sync'd it *many* times without a single error. This of course reads every bit from every drive in the array. See data in this graph I produced from some of these tests:

https://joema.smugmug.com/Computers/Pegasus-R4-RAID5-Sync-Rate/n-p2Nn4p/i-pdz6p2j/A

It's important to understand the ZDNet articles by Robin Harris are based on an incorrect understanding of bit error rates. RAID-5 obviously did not stop working in 2009 as he predicted, and unfortunately by his second article in 2013, he still had not grasped his mistake. I know of no basis to confidently state that RAID-5 "has stopped working" or that the chances of rebuilding a 20TB array "are slim".

VirtualRain · Dec 7, 2015

I'm just going on research I've come across. If it's all wrong them I'm wrong and you can ignore what I said. However, even if I'm completely off the mark, I'd still suggest people think carefully before using RAID5.

People with any storage project should do their own research and understand the risks and other factors affecting data loss and productivity impact and adopt a solution that meets or exceeds their acceptable risk level, system availability needs, performance requirements and budget.

AidenShaw · Dec 7, 2015

VirtualRain said:
I'm just going on research I've come across. If it's all wrong them I'm wrong and you can ignore what I said. However, even if I'm completely off the mark, I'd still suggest people think carefully before using RAID5.

People with any storage project should do their own research and understand the risks and other factors affecting data loss and productivity impact and adopt a solution that meets or exceeds their acceptable risk level, system availability needs, performance requirements and budget.

Think is the important word here. We're not "over the edge" yet with bit-error-rate issues and RAID-5 - but it is something that can't be ignored and can easily be an issue with poor storage design.

Don't make a 12 drive RAID-5 volume - you'll regret it. Don't make a 6 drive RAID-5 volume, too high of a chance that you'll regret it. I configure 12 drives as a two-way RAID-60 volume, and add a hot spare or two. Most of my storage shelves have 25 drives, so it's four-way RAID-60 with a hot spare. For high capacity "near-line" storage - the standard building block is a 72TB shelf (twelve 6TB drives) in two-way RAID-60 for 48TB usable (with global hot spares).

For those of us for whom the line "you'll probably be able to rebuild your array" is just not good enough - "RAID-5 is dead".

ps: I've just configured a Spark cluster, and the local scratch space on each node is RAID-5 with hot spares - RAID-5 is good enough for temp scratch space.

FireWire2 · Dec 8, 2015

One factor, that most would missed, ECR - Error Correction Recovery - that build in each and every HDD. I have built 100's raid5 volume some up to 24TB / 32TB. It's a just pain when rebuild. None roll-out and die on my jet
But yes, agree with joema2 if you do not know how to maintain your raid... RAID6 will failed on you.

Lot of users mis-inform with RAID5/6 volume, they NEVER bother to perform volume check, that where are things start

So whether RAID5 or 6 check the volume

it would be lot less headache if you will

A 12x drive raid60 is an overkill, not efficient and with SAS system, there is noway you can expand the existing RAID.

AidenShaw · Dec 8, 2015

FireWire2 said:
A 12x drive raid60 is an overkill, not efficient...

Your opinion.

In my case, the cost of downtime or loss of data is very great. Buying 72TB raw to get a usable, reliable 48TB is not inefficient - it's smart. It might be overkill if your array has 128 GB SSDs, but with 6 TB spinners the bit error rate issues are not theoretical.

A twelve drive RAID-5 set would be ludicrous under any circumstances - although you'd get 66TB usable.

Twelve drive RAID-50 would give 60TB, but a second error during rebuild leaves you with 0TB.

Twelve drive RAID-60 gives you 48TB, and the ability to withstand the failure of two drives (perhaps even four drives failing).

FireWire2 said:
...and with SAS system, there is noway you can expand the existing RAID.

That's funny, because yesterday I added two drives to a RAID-60 set - it not only expanded the array, but it did it online while the system was in use. You should look into better SAS controllers.

This afternoon (often multi-TB online reconfiguration operations take a while) my logged in users noticed that there was a lot more free space on the work drive. Completely online - no reboots, no dismounts, no logouts.

One minute, the drive shows 100GB free. A few minutes later, the same command shows 2.5TB free. It just works.

FireWire2 said:
So whether RAID5 or 6 check the volume it would be lot less headache if you will

We're definitely on the same page here. A huge percentage of "RAID failure" events are "the humans didn't notice that one drive had failed, and lost the array when the second drive failed".

My controllers perform a background full volume scan (read and verify every data and parity chunk) every 72 to 168 hours. Not only that, but they also continuously monitor S.M.A.R.T. data and if a drive enters "failure predicted" mode the controller will:

copy the "soon to fail" drive to a hot spare
- Note that this is not a "rebuild" -- it's a simple sequential copy of the suspect drive to the hot spare. If there's an error on a chunk, then that one chunk will be rebuilt (if necessary) from the other drives, and the sequential copy proceeds.
put the hot spare into the array as a full member, removing the suspect drive
put the suspect drive on the "unusable" list
send an email requesting service, with serial numbers, failure codes, slot numbers, etc

I forward the email to HP, and a new drive arrives in a day or two. I have the tech remove the drive with the yellow light and insert the new drive. Back to fully redundant and spared service (actually, at no point in the process was redundancy lost).

Depending on the array, either the event is over - or the controller will copy the data from the "hot spare" to the newly inserted drive.

If the array is homogeneous, then it's over.
A global hot spare can be used for any failing drive - so a 1.2 TB spinner might be used to replace a failing 200GB SSD. You'd want that array to revert to the new 200GB SSD and put the 1.2 TB spinner back as hot spare.
Some setups have different bandwidth subscription models, in which case you might want to restore the original topology. (I have a number of controllers where the first 24 drives have direct SAS channels (6/12 Gbps per disk, 144/288 Gbps aggregate), and the next 175 drives share a 4 lane daisy chain with 24/48Gbps. If you've put an array in the first group for performance, you probably don't want it using a disk on the daisy chain.)

Recommendation for Thunderbolt RAID?

macrumors 65816

macrumors 68000

macrumors member

macrumors 65816

macrumors 65816

macrumors 68000

macrumors G5

macrumors 6502a

Cancelled

macrumors 65816

macrumors 603

macrumors G5

macrumors P6

macrumors 603

macrumors 68000

macrumors P6

macrumors 68040

macrumors P6

macrumors 68000

macrumors P6

macrumors 68000

macrumors 603

macrumors P6

macrumors 6502

macrumors P6

Our Staff