RAID5/6 Rebuilt times - OUCH!

Discussion in 'Mac Pro' started by pprior, Jan 2, 2011.

  1. pprior macrumors 65816

    Aug 1, 2007
    I recently bought the Areca 1880ix-16 card and the backplane cable to hook to the mac pro internal drives - hooked up 4 x seagate ST32000644NS 2TB enterprise drives for an internal RAID5 with 6TB of space. Used the DX4 from transinternational to hook up my 2 SSD drives.

    The initial raid build time was just about 2 days (!) for the raid 5. Shortly after that I left town for a few days and came back to an alarm in my computer - one of the disks had dropped out. The rebuild time was again close to 2 days!.

    At this point I realized that if I was sitting there with that much data at risk for a 2 day rebuild, it would be agonizing and so I needed to go raid6 (which had been my goal from the start, but only 4 internal disks is very limiting).

    So I gave up on managing my storage internally and bought an istorage pro it8SAS external box connected by miniSAS cable.

    The raid5 had to complete the rebuild before it would let me expand to raid6.

    I added another seagate ST32000644NS 2TB enterprise drive (interestingly the drive that was originally listed as failed has caused no further problems at all) and expanded the raid to RAID6 with the same capacity but one extra drive for parity.

    The rebuild now has been going on for almost 48 hours and it's at 53%

    I had NO idea the rebuild times would be this long. I do have 2.3TB of data on the array, so I'm not sure if that prolongs it vs. an empty array.

    But just for others info, please take into consideration these times when deciding what your own needs are - I bought a high end (AFAIK) raid card, running in a high end computer (mac pro 3ghz octocore with 16GB RAM) and quality enterprise drives, and it's still maddeningly slow to rebuild.

    Hopefully once it's up and running, this will all be forgotten, and if I ever have a drive fail, I know I can still lose another one without any data loss. RAID6 seems the only way to go given these times for rebuild.

    FWIW, I also put in another 3 drives - old 500gb drives I had sitting around and built a secondary RAID5 array since I had the space, that completed in around 36 hours or so, and was done during the original raid5 rebuild.
  2. zhenya macrumors 603


    Jan 6, 2005
    Yes, RAID rebuild times can be extremely long. That said, it shouldn't matter, because the data remains available during the rebuild. It sounds like you are expecting RAID to function in place of a backup, which it is NOT designed to do. You still need to back up the data on the RAID array! Therefore the data is never really at risk during the rebuild, because you can always restore from backup.
  3. nanofrog macrumors G4

    May 6, 2008
    If I read this right, it took 48hrs to initialize an 8TB pool (unconfigured capacity) into a RAID5?

    That's too slow for an empty initialization using modern disks (I'm accustomed to ~2hrs per 1TB on Areca cards on average).

    I'm wondering if something wasn't set right (card's settings, jumper on the disks, ....), or if there was a cable connection issue (caused instability, which could explain why the disk dropped out and caused the card to think that it was bad). :confused:

    BTW, like SATA cables, you can get bad connections. You'd be amazed how often a simple little wiggle can sort it out (easier to start with the SFF-8087 end). You'll even get a bad one from time to time, which is more of a problem than standard SATA cables, as they're more expensive to replace/keep spares on hand. In your case, the card should have had 4x fan out cables in the box (had wiggling or disconnecting/reconnecting didn't work), which would provide 3x spares.

    This is an issue with any large capacity redundant array though, and why you do have to take rebuild times into consideration (i.e. plan that another disk can go during a rebuild, so pushing the redundancy can mitigate this to some extent).

    Making more arrays @ smaller capacity and splitting the load is another (expensive, but effective and necessary under certain conditions - think data centers).

    I wish you would have tried to test out the cable connections first, to save yourself some funds.

    Online expansion cannot take place on a degraded array. So you had no choice but to wait for the rebuild process to complete before the expansion can proceed.

    Unfortunately, this will eat a significant chunk of time.

    Data definitely increases the time it takes to rebuild. And the more you have, the longer it takes (more parity data to read + calculations to perform, then write to the disks).

    It took that long due to a combination of the other going on, and that your 500GB disks aren't that fast.
  4. goodcow macrumors 6502a

    Aug 4, 2007
    My DroboPro rebuilds quicker than that.

    I had 4x1TB and 4x2TB in their proprietary equivalent of a RAID 6 and it was 92% full. Popped one drive out to expand with another 2TB and it took 18 hours. After that was done, popped out another 1TB and expanded with yet another 2TB, took another 18 hours. I have 6x2TB and 2x1TB now. Drobo may be pricey, but it's just so simple and rock solid.
  5. pprior thread starter macrumors 65816

    Aug 1, 2007
    Nanofrog - it was in large part reading your advice that I decided to move forward with my plan, so I appreciate your response.

    I thought it was slow as well, but I'm seeing the same time now using the miniSAS connection to a dedicated enclosure as I did using the internal connector to the mac pro backplane.

    It's now almost 4 hours after my last post here and the RAID page shows I'm at 60% migration (so that's 7% in 4 hours).

    As to the cost and cable issue, after seeing how long it would take for a rebuild, and as much as I put into the card and drives, the extra $1200 or so for an external array that is expandable I think was well worth the money.

    I've been through the web based areca software and nowhere can I find any diagnostic information that there is a problem, so I wonder why I'm seeing so much slower speed than you report should be normal.

    Goodcow - I had an older version of Drobo and while I liked very much the simplicity, the performance was slow, to be kind, and I found my unit to be somewhat unstable. AFAIK I should have significantly improved speed over a drobo once I'm up and running. I'm glad they've improved their product, but I don't think they are more expensive than my total cost.
  6. nanofrog macrumors G4

    May 6, 2008
    Speed depends on the model, as most are rather slow. What I'm concerned with for the speed you've got, is lazy writes in the write hole issue. You could end up with data corruption that way if their software solution doesn't fully solve that issue. In a worst case scenario (they didn't solve it fully), fast or not won't matter as the data may be shot anyway. :eek:

    My other issue is Data Robotic's slow software support (take forever to release fixes), which is dangerous for a device you trust your data to.

    My take on Drobo's anyway... ;)

    :cool: NP. :)

    Then this definitely isn't a cable issue (as to the rebuild time). It may still be the cause of the initial drop-out however (hard to know for sure given you've changed things to an external solution - can't test).

    Is it just the time to rebuild the RAID 5?
    Or is it including the Online Expansion as well?

    I'm trying to gauge what the card is doing, and where it is in that process.

    I don't disagree, but it should have been a result of (or combination):
    1. Need to expand the array's member count past what's possible inside the MP (speed, capacity, change the level, or some combination).
    2. Must have Hot Plug capability (trays in the external enclosure, as you cannot Hot Plug a disk inside the MP - no inrush current limiter circuit to protect the PSU).
    Did you place the Seagates in another system (PC based) and perform surface scans prior to initializing the set?

    I ask, as there could have been bad sectors from the moment you opened them. :eek: It's always a good thing to do, and worth the time (can save you time in the end for diagnostics, as it can let you know of a bad disk before you waste massive amounts of time trying to figure out what's going on).

    Unfortunately, the MP's firmware doesn't pass low level information properly to use a DOS or Windows based application for surface scans, hence the need to use another computer (learned by trying it by multiple MR members on multiple machines, including 2010's). :(

    I'd scan your disks at this point (borrow, beg, bribe, whatever it takes to gain access to a PC :p). If that doesn't reveal bad sectors or other issues (i.e. bad controller), I'd contact Areca about your situation (I've not had access to the 1880 series yet, but as it's the fastest unit they make right now, it seems odd).

    Also keep in mind, the time/TB I gave was for initialization only, not Online Expansion or Rebuilds (both of these processes definitely require more time, as it's trying to retain the data vs. write over any existing data as happens during the initialization process). I'm not sure if that was clear enough in the last post, and I can't stress enough the drastic difference here. And the more data on the set, the longer it takes (many more parity blocks to read, reconstruct the data, and write to the new disk). Then it has to do it again with the additional disk/s for Online Expansion. This is why it's so darn slow.

    More information (questions above), could help answer it though (if you've a real problem in terms of speed or not). :)

    Also, please give a detailed account of the card's settings, and make sure you don't have any jumpers on the disks (sets them to 1.5Gb/s).
  7. cutterman macrumors regular

    Apr 27, 2010
    There is a background task priority setting in System Functions that may be set too low.
  8. dknightd macrumors 6502

    Mar 7, 2004
    First, let me be clear, I have no experience with redundant raid arrays on a mac. But do have experience using netapp hardware and linux boxes.

    48+ hours to initialize (or rebuild) a 4 disk 8tb raw array seems too long.
    I would have guestimated 18-24 hours. Maybe the Areca card is very careful. Not a bad thing. Or my guestimate could be wrong (build/rebuild time depends on size, speed, and number of spindles, and the speed of the processor).

    Does the card allow you to partition resources between rebuilding and serving data? If so, perhaps you have it set to favor serving data during a rebuild? But that does not explain the long build time.

    It bugs me that macOS (or apple hardware) no longer has the ability to surface scan a disk. Perhaps I'm old fashioned, and out of touch, but this seems like a good capability to have. . .

    Does the Areca card support hot spares? You've spent a lot of money, you might want to have a hot spare disk, that way rebuilds can automatically start while you are out of the office.
  9. dknightd macrumors 6502

    Mar 7, 2004
    That is a good (must have?) feature. Some are priority based, others are absolute partitions of resources.

    OP, I suggest you contact the vendor. You are a new user of their product, hopefully they will be happy to help, let us know. In any case, if your are not serving data, then crank the background priority as high as possible . . . You can always lower it again when serving data has higher priority than (re)building.
  10. nanofrog macrumors G4

    May 6, 2008

    This is why I asked for a detailed list of the card's settings (doubt the Manual gives a clear understanding what this is for if it's written by the same person/s that wrote the previous manuals - cut and paste comes to mind -, as there's some expectation that the user knows what "Background Tasks" are).

    In the case of a DAS, I prefer to leave the system to complete rebuilds and expansion without concurrent data access, so I set the Background Priority to High (less stress on the disks, and it's faster to get the array back to 100% functionality and/or expanded).

    If the OS/applications disk is separarate from the array (i.e. single disk on the ICH), then the user can still browse the net, send off emails, ... , so long as the application/s used do not need to access the array. ;) Quite handy for getting a broken array back up and running again, such as web research (especially in situations where the array is toast <data is gone>, not in a degraded state).
  11. pprior thread starter macrumors 65816

    Aug 1, 2007
    Background services are set to high (which was the default).

    My OS is on another disk (SSD) and I'm not writing anything to the array, it's jsut sitting there though it is mounted with some data, so I'm sure the OS checks in on it from time to time, but I'm not actively accessing it nor serving to any other users.

    I had sent an email to user support, but got a very vague response that it "could" take that much time...

    I'm starting to remember why I ditched the last ARECA raid card I had on a PC about 5 years ago.

    It does support hot spares, but I'm kinda looking at raid6 as raid5 with a hot spare already online and rebuilt :) I don't want to put yet another $200+ to just sit there and eat up another drive bay.
  12. cutterman macrumors regular

    Apr 27, 2010
    OP if you are still looking for info or help, check out the storage forum on There is a user there (Jus) who sells Areca equipment and is very knowledgeable in troubleshooting their array problems.
  13. nanofrog macrumors G4

    May 6, 2008
    You're making the assumption there's a problem when there may not be. I'm not saying there isn't, but the detailed information requested would be needed to tell one way or the other. ;)
  14. pprior thread starter macrumors 65816

    Aug 1, 2007
    I don't have access to a PC and I don't see myself removing the disks and then subjecting myself to a 4+ day rebuild for each one as they are put back in. I see your point that surface scanning before putting them into place would have been a good idea. There is no way to do that now?

    system beeper: enabled
    Background task priority: High (80%) [note: that's the highest setting]
    SATA NCQ support: enabled
    HDD Read ahead cache: enabled
    Volume Data Read ahead: Normal
    HDD Queue depth: 32
    Empty HDD slot LED: on
    SES2 support: enabled
    Auto activate incomplete raid: disabled
    Disk write cache mode: auto
    Disk capacity truncation mode: multiples of 10G

    Stagger Power On Control: 0.7
    Time To Hdd Low Power Idle: Disabled
    Time To Hdd Low RPM Mode: Disabled
    Time To Spin Down Idle HDD: Disabled

    DHCP function: Enabled
    ...ip adressess...blah blah blah....

    Raid Set Hierarchy:

    RAID Set Devices Volume Set(Ch/Id/Lun) Volume State Capacity
    Raid Set # 000 E#2SLOT 14 SSDSA2M160G2GC (0/0/0) Normal 160.0GB
    Raid Set # 001 E#3SLOT 04 ARC-1880-VOL#001(0/0/1) Migrating(99.5%) 6000.0GB
    E#3SLOT 01
    E#3SLOT 02
    E#3SLOT 03
    E#3SLOT 05←
    Raid Set # 002 E#2SLOT 04 WD3000HLFS-01G6U(0/0/2) Normal 300.1GB
    Raid Set # 003 E#2SLOT 01 HDS722020ALA330 (0/0/3) Normal 2000.4GB
    Raid Set # 004 E#2SLOT 03 WD10EACS-00ZJB0 (0/0/4) Normal 1000.2GB
    Raid Set # 005 E#3SLOT 06 ARC-1880-VOL#005(0/0/5) Normal 1000.0GB
    E#3SLOT 07
    E#3SLOT 08

    Enclosure#1 : ARECA SAS RAID AdapterV1.0
    Device Usage Capacity Model
    Slot#1 N.A. N.A. N.A.
    Slot#2 N.A. N.A. N.A.
    Slot#3 N.A. N.A. N.A.
    Slot#4 N.A. N.A. N.A.
    Slot#5 N.A. N.A. N.A.
    Slot#6 N.A. N.A. N.A.
    Slot#7 N.A. N.A. N.A.
    Slot#8 N.A. N.A. N.A.
    Enclosure#2 : Areca ARC-8018-.01.06.0106(F)[5001B4D706F4103F]
    Device Usage Capacity Model
    SLOT 01(C) Pass Through 2000.4GB Hitachi HDS722020ALA330
    SLOT 02 N.A. N.A. N.A.
    SLOT 03(D) Pass Through 1000.2GB WDC WD10EACS-00ZJB0
    SLOT 04(E) Pass Through 300.1GB WDC WD3000HLFS-01G6U0
    SLOT 05 N.A. N.A. N.A.
    SLOT 06 N.A. N.A. N.A.
    SLOT 07 N.A. N.A. N.A.
    SLOT 08 N.A. N.A. N.A.
    SLOT 09 N.A. N.A. N.A.
    SLOT 10 N.A. N.A. N.A.
    SLOT 11 N.A. N.A. N.A.
    SLOT 12 N.A. N.A. N.A.
    SLOT 13 N.A. N.A. N.A.
    SLOT 14(A) Pass Through 160.0GB INTEL SSDSA2M160G2GC
    SLOT 15 N.A. N.A. N.A.
    SLOT 16 N.A. N.A. N.A.
    EXTP 01 N.A. N.A. N.A.
    EXTP 02 N.A. N.A. N.A.
    EXTP 03 N.A. N.A. N.A.
    EXTP 04 N.A. N.A. N.A.
    Enclosure#3 : Areca x36-05.8A.1.40 000 (18)[5001B4D50705F03F]
    Device Usage Capacity Model
    SLOT 01(17) Raid Set # 001 2000.4GB ST32000644NS
    SLOT 02(16) Raid Set # 001 2000.4GB ST32000644NS
    SLOT 03(15) Raid Set # 001 2000.4GB ST32000644NS
    SLOT 04(14) Raid Set # 001 2000.4GB ST32000644NS
    SLOT 05(13) Raid Set # 001 2000.4GB ST32000644NS
    SLOT 06(12) Raid Set # 005 500.1GB HDS725050KLA360
    SLOT 07(11) Raid Set # 005 500.1GB HDS725050KLA360
    SLOT 08(10) Raid Set # 005 500.1GB SAMSUNG HD501LJ
    SLOT 09 N.A. N.A. N.A.
    SLOT 10 N.A. N.A. N.A.
    SLOT 11 N.A. N.A. N.A.
    SLOT 12 N.A. N.A. N.A.

    Raid Subsystem Information
    Controller Name ARC-1880
    Firmware Version V1.48 2010-09-16
    BOOT ROM Version V1.48 2010-09-09
    PL Firmware Version
    Unit Serial #
    Main Processor 800MHz PPC440
    CPU ICache Size 32KBytes
    CPU DCache Size 32KBytes/Write Back
    System Memory 4096MB/800MHz/ECC
    Current IP Address

    Hardware monitor:

    Stop Auto Refresh
    Controller H/W Monitor
    CPU Temperature 70 ºC
    Controller Temp. 48 ºC
    12V 12.099 V
    5V 5.053 V
    3.3V 3.328 V
    DDR-II +1.8V 1.824 V
    CPU +1.8V 1.840 V
    CPU +1.2V 1.248 V
    CPU +1.0V 1.040 V
    DDR-II +0.9V 0.912 V
    Battery Status Charged(100%)
    Enclosure#1 : ARECA SAS RAID AdapterV1.0
    Enclosure#2 : Areca ARC-8018-.01.06.0106(F)
    Enclosure#3 : Areca x36-05.8A.1.40 000 (18)
    1.2V 1.160 V
    5V 4.940 V
  15. nanofrog macrumors G4

    May 6, 2008
    Not in the MP. That's why you have to pull them and attach them in/to a PC (external is easier, such as via an eSATA dock).

    At this point, I understand your reluctance to do it, but it leaves a question of whether or not they're good disks.

    I was hoping the data you've offered gave the Seagate's firmware revision (click on the individual members, and it will give the firmware and serial numbers of the drive). They passed using version SN11 on both the SATA and SAS controllers, according to the HDD Compatibility List (published 21 July 2010). IF it's different (older), this could be the problem.

    I also checked if there's newer firmware for the card, and to date, there isn't (still v.1.48).

    This is all fine.

    A couple of points though:
    • High can't be allowed to go to 100%, or there would be no way to access the array during a rebuild or expansion. As this isn't considered desirable, they limit it to 80%.
    • You don't need Truncation = 10G on identical disks. You don't even need it at all, though I personally go ahead and set it for 1G rather than none (i.e. variances due to remapping bad sectors).

    BTW, what are the Queue Depth setting options available on that card (not sure if 32 is the max value)?
    Increasing that value could help speed things up.
    This is fine.

    Migration = move from 5 to 6, so it included all the processes you were trying to do (Rebuild the level 5, then expand and migrate simultaneously). All of that was a lot of work for the card to do.

    It's nearly done so it will be back up and running soon, but I'd recommend you go ahead and check out what I linked. At this point I don't think there's anything wrong (presuming the disk firmware will check out, and they're new enough that you shouldn't need to Disable NCQ), given the entire load you had it perform.

    But the Queue depth increase could help speed things up for you (n = 5 member count, and each disk should be good for ~190 IOPS as they're 7200 rpm SATA). Worth testing it out IMO (test the completed array with the existing setting for a base line, then increase the Queue depth, and compare).

Share This Page