Opinions wanted re backup options for large RAID array

Discussion in 'Mac Accessories' started by Lastmboy, Aug 30, 2014.

  1. Lastmboy macrumors regular

    Joined:
    Jan 16, 2012
    #1
    There may be no "right" answer to this, but I am interested in any opinions.

    I have an 8-bay hardware RAID box with eight 3TB NAS drives in it. I also have a second 8-bay box that isn't being used at the moment. I'm trying to decide the best way to setup a large storage array that is both fast and secure. Currently it is setup as an 18TB RAID 50 array. However, the "catalog" has gone corrupt and OSX will now only read from it (can't write or delete). I have tried disk utility as well as several of the other top utilities, and none are able to rebuild it. Thus, I guess I have to copy everything off of it, format it, and copy everything back on.

    Since I have to go through all this hassle anyways, I'm pondering other options. Even RAID 1 would not have helped me in this case, as there is nothing wrong with the drives themselves. Just corrupted data. This is making me wonder if I should just set it up as RAID 0, then use Time Machine or other software and mirror it to the other 8-bay box. If anything goes wrong with the first one, including data corruption, I've got it all on the other one. I could go with RAID 5 on both boxes, to make it easier to deal with single drive failures. Would the backup software just end up creating the same corruption on the 2nd array?

    What are your thoughts/ideas/suggestions about this? Thanks.
     
  2. flynz4, Aug 30, 2014
    Last edited: Aug 30, 2014

    flynz4 macrumors 68040

    Joined:
    Aug 9, 2009
    Location:
    Portland, OR
    #2
    How critical is your data? Is this for consumer use, or is it critical to run a business? Is the data easily segregated into different pools, that can be managed independently?

    I have used RAID a lot... and I have come to the conclusion that it is a lousy technology for most consumer applications. This is especially true as single spindle drives have become so large. Enterprise solutions are a different story... but to properly maintain a large enterprise class RAID array is very expensive (10's or 100's of thousands of dollars (or more). Not expensive if it is critical to run an enterprise... but generally way out of reach of even the most ambitious consumers.

    Assuming that you are using this for consumer purposes... and expensive solutions are out of reach:

    First thing to consider is to erase any pre-conceived notions that RAID increases reliability. I believe just the opposite is true... RAID systems introduce new failure mechanisms that generally do not exist in single spindle drives. I believe them to be less reliable. Your current situation is a perfect example... except that you can still read your data... so you are are lucky -- many others are not.

    For consumers, there are generally only two real arguments to use RAID:

    1) To get increased performance (as measured in IOPs).
    2) To get a single volume that is larger than available on single spindle drives

    Item #1 above is pretty much dead now. Any IOPs intensive application or usage is almost certainly better off by using an SSD, and SSDs are generally much less expensive solution... especially since the physical capacity requirements of high IOPs is generally small (ex: <1TB) and affordable in an SSD

    Item #2 is much more real. If you really need a single volume that is larger than is available in single spindle drives... then RAID of some flavor is about the only mechanism to get what you need. With 6TB HDDs available now... very few applications need single volumes with more capacity.

    This is why I asked if you can split your work across multiple volumes. By doing so you can buy 2 or 3 physical drives per usage pool... and use traditional backup and cloning methods to guarantee that your data is replicated. You can also easily back up the most critical data via cloud services, so that you have disaster recovery.

    Personally... I have had a lot of storage devices for my data over the years. Many have been NAS units, and they are all decommissioned now. About the only really large usage was for digital storage of my video library. I have come to the conclusion that even keeping it around is an old fashioned and clunky way of thinking. I might just delete it all. Virtually everything is available real time on the web... so why do I even want a library? It would free me so that I could finally be 100% SSD... and I could finally turn off (for good) my last remaining array... a Thunderbolt 8TB Pegasus R4.

    /Jim

    P.S. I will never buy another computer with a HDD. I will continue to use my Pegasus R4 as an overflow data drive... but it is only a matter of time before it ends up in the trash and I will be 100% SSD

    P.S.S. My Pegasus R4 is no longer in my office... it is in a secured area and connected via a 30m optical Thunderbolt cable. All of my backup Time Capsules are also secured away. It is bliss to have essentially no background noise in my office anymore.
     
  3. westrock2000 macrumors 6502

    Joined:
    Oct 18, 2013
    #3
    Since your NAS is going to be limited by the Gigabit network connection, you should avoid RAID0 and just go with "spanning" / "concatenated". You will only access a single drive at any given time, but it should be able to saturate the Gigabit by itself. It will also put less wear on the drives.

    RAID5 is still useful. Because you can still have the uptime if one of the drive fails. If you are using RAID0 and one drive fails, even if you have a backup it will still take a long time to transfer everything from the backup.

    I've had a RAID5 go into degraded state due to a bad drive and it was still useable while the array rebuilt itself after replacing the bad drive.

    The safest overall method is individual drives and volumes with a 100% backup. That way if a drive fails, it does not keep you from accessing the other drives and you only have to copy back over that one failed drive. But the downside is managing separate volumes.
     
  4. Lastmboy thread starter macrumors regular

    Joined:
    Jan 16, 2012
    #4
    Thanks for the replies, guys. I should provide a bit more information. The 8-bay box is connected to my 2009 Mac Pro, which serves as my file server. In my office, I have a Promise Pegasus R6 connected to my iMac. My OS and apps are on SSD, and all data is on the Pegasus. I use Time Machine to backup the Pegasus to a Drobo. Both the Pegasus and Drobo are Thunderbolt and RAID 5. My office scenario has worked spectacular. The Pegasus is more than double the speed of my SSD. I've had one drive go bad in it, but Promise sent me a new one, tray and all. I didn't even have to turn it off. Just popped out the old tray and stuck in the new one. It promptly rebuilt it. I also back up my most critical data to the cloud.

    The RAID 5 backed up to RAID 5 scenario seems to work pretty good. However, I was curious as to what others are doing. This problem with a "catalog" going corrupt seems to be a problem related to OSX and is very annoying, as it has happened quite a few times. I have been able to rebuild it on a few occasions.

    Of course, the SSD-only option is nice, but I have too much data for that to be feasible. The reason I use RAID is based on both reasons mentioned above. I like both the speed and the single volume. However, the speed is my primary reason. My R6 will read and write around 350 MB/s. I hear the TB2 Pegasus gets up to 800 MB/s. A single 7200 rpm drive is around 100 MB/s. The SSD in my iMac is around 150 MB/s. I have a recording studio and I do a lot of audio editing, which accounts for the required space and speed. I also have some massive sound libraries, and am a software developer, so I have a large library of graphics and code.

    That said, you bring up a great point. The RAID on my file server is likely being throttled by my gigabit network. Not sure what top network speed is... 60 - 70 MB/s?? With that in mine, maybe I'm better off just keeping the volumes separate. Definitely something to ponder. There are other annoyances with the many volume approach, though. One issue is the sheer amount of volumes that have to be mounted every time you boot up. Not a big deal, but a bit of a pain. It also makes backups a bit of a pain, and guarantees more of a hassle if one of the drives bites the dust.

    Just as a point of interest... I picked up a 2013 Mac Pro a few weeks ago. The SSD in that little round guy is incredible. I can get consistent read and write speeds of over 950 MB/s. However, I don't think Thunderbolt is dead quite yet, as most people are going to want to connect some external storage to the nMP.
     
  5. MRrainer macrumors 6502a

    Joined:
    Aug 8, 2008
    Location:
    Zurich, Switzerland
    #5
    RAID5 is not recommended

    Especially with 3TB drives.
    It's OK-ish if you have a couple of 144GB SAS drives sitting around.
    But the time it takes to rebuild a RAID5 with >1TB drives is just too long.
    Nowadays, RAID6 is recommend.

    It seems you are suffering from "bit rot" - ZFS is the only 100% solution for that.

    If the data is important, I'd always trust ZFS.
    You can build your own NAS with FreeNAS (http://www.freenas.org) or buy a device like the FreeNAS Mini (http://www.ixsystems.com/storage/freenas/).

    As for direct attached storage - well, there also a version of ZFS for OSX.
    https://openzfsonosx.org

    The best thing about ZFS is that you can send incremental snapshots at arbitrary intervals to a 2nd ZFS device, with very little impact on your production-system.

    I'm syncing a 6TB fileserver to a 2nd unit using hourly snapshots.
    The initial sync was using 2.5 TB and took 10h over GB ethernet.

    The worst problem with cheap proprietary NAS devices that use hardware RAID is recovering the data in case of a HW failure (RAID controller failure, e.g.).
    You might get a replacement-unit from the manufacturer (if the company is still in business) - but there's no guarantee that the new controller can actually read and understand the data written by your old controller.

    ----------

    And to make a RAID-array of SSD drives, you need a RAID-controller that can actually cope with the data-rate.
    AFAIK, you need a "12G" SAS controller for that - it's not necessarily a question of speed, but the number of IOP/s that overwhelms "normal" RAID-controllers.
     
  6. Lastmboy thread starter macrumors regular

    Joined:
    Jan 16, 2012
    #6
    Thanks! I'll do some reading about ZFS. Looks interesting.
     
  7. flynz4, Aug 31, 2014
    Last edited: Aug 31, 2014

    flynz4 macrumors 68040

    Joined:
    Aug 9, 2009
    Location:
    Portland, OR
    #7
    You are measuring bandwidth, which in most cases, is not a very meaningful metric... except in the amount of time to transfer a large sequential file. Of much more importance is IOPS. A HDD can only deliver about 200 IOPS. A lousy SSD might be 20,000 IOPS (100X faster) and a great SSD might be 200,000 IOPS (1000X faster). It is IOPS which makes a computer scream... not sequential bandwidth.

    That said... a Thunderbolt Pegasus is a great product. For my usage... it is the best of the consumer/prosumer class external storage solutions available right now.

    Correct. However, current RAID controllers chips (even some of the best enterprise ones) do not have the processing power to support more than a small handful of SSDs. This is because they have been primarily designed for the very low IOPS rate of HDDs, and just cannot handle the higher IOPS rate of SSDs. For RAID of SSDs... software RAID seems to actually scale better than HW RAID. This is in stark contrast to HDDs where hardware RAID often has a distinct advantage.

    RAID of SSDs have another issue. If you look at scatter maps of an SSD's access time... many consumer grade SSDs have a very wide band. That means that random access for any striped RAID mode must wait for the slowest SSD. That can bring down the performance particularly for low queue depth IOPS (the most important of all IOPS measurements). By contrast, enterprise SSDs have a very tight access time scatter mapping... which continues to allow high IOPS even in RAID. However, few consumers pay they extra for enterprise SSDs... and even if they were willing to... they are generally not offered through consumer channels.

    I do not think I would go through the trouble of RAID for SSDs. The return is just not there. Right now I use a moderate sized SSD (768GB) in conjunction with my Pegasus R4. I keep media files (music and video) on the Pegasus, since they are sequential files and do not benefit much from high IOPS. Everything else is on the SSD. I suspect by the time I get my next iMac (probably 2 years)... then 2TB SSDs will be commonly available. Once affordable (~ $1K) SSDs get to 3-4TB... then I would probably retire the Pegasus... or just use it as a backup destination.

    /Jim
     
  8. Lastmboy thread starter macrumors regular

    Joined:
    Jan 16, 2012
    #8
    This is quite interesting. I have a question... way back in my Windows PC days, when SSD's first came out. I got all excited about them and started using them as system drives... I even striped two of them as a system drive in one pc. It was ok, but wasn't enough faster on bootup than a single HDD to be particularly impressive. A friend of mine found the same thing. We were both quite disappointed. That said, my nMP boots up in a few seconds. Are the SSD's just that much more impressive now than when they first arrived on the scene?
     
  9. flynz4 macrumors 68040

    Joined:
    Aug 9, 2009
    Location:
    Portland, OR
    #9
    Yes, they continually get faster for a number of reasons. NAND has improved, and the number of NAND channels has increased offering greater internal parallelism. The internal NAND interface has moved to ONFI 2, then ONFI 3. The external interface has moved from PATA, to SATA-2, to SATA-3, to PCIe. The protocol has evolved from AHCI to NVM Express. All of these things have driven SSD performance higher.

    Your old problem might also have been related to striping them. While striping might increase sequential bandwidth... it could have a negative effect on the much more important IOPS... especially at low queue depths (which is all that really matters).

    /Jim
     
  10. MRrainer macrumors 6502a

    Joined:
    Aug 8, 2008
    Location:
    Zurich, Switzerland
    #10
    Also, Cache-RAM on SSDs has increased performance even more.
    Unfortunately, most SSDs with Cache-RAM are not battery or capacitor-backed....
     
  11. westrock2000 macrumors 6502

    Joined:
    Oct 18, 2013
    #11
    With Jumbo frames enabled on my Mac Pro I see a max of about 95-105. With "normal" frames, it's closer to about 85.



    Yes, my first SSD had a random read/write of about 3-4MB/s, it was still better then the 1-2 my HDD could do, but todays SSD's are easily in the 50-60MB/s random read/writes while hard drives have not increased a noticeable amount. Thats what makes SSD's snappy.
     

Share This Page