Optimizing Synology disk storage with 8TB drives

Synology has really made it easy to get yourself into trouble. We have a 12 drive Synology at home and a 8 bay plus older Drobo 8 bay and a Drobo 4 bay. Somehow we’ve ended up with terabytes and terabytes of storage, but I’m always running out on one volume or another, so here are some recommendations. While expensive, a 12 bay array is super flexible, it is about $1K to buy one of these, but the Synology operating system by itself is worth it. Frequent updates and lots of features.
The main problem is that even with enterprise class drives, you are likely to get errors if you have rebuild a RAID array. The math for 10TB drives is that you get an error every 10^15 bits you read, that’s about 120TB per error. That doesn’t sound so bad until you realize that if a RAID drive fails all the other drives have to be read. Let’s do an example:

You have 6 x 10TB drives in a RAID 6 array.
You lose one drive, now the other 5x10TB drives or 50TB have to be read. But you still have one drive of redundancy.
The odds are 50/50 (120TB per error/50TB needed to be read) that the rebuild will fail, that is the second.
Now you have no more redundancy and you still have to read 4x10TB to rebuild the first parity drive. The odds are 1 in 3 that this rebuild will fail and you will lose the whole array.
If you do all the math, then there is a 50% * 33% = 16% chance that upon losing a drive, you will lose the entire array.

That’s not designed to make anyone feel so good particularly if you use cheaper consumer drives that fail 10^14 times (or about 12TB.failure). That means reading a full 8TB drive has a 1 in 3 chance of an error?!!! Even a 6TB drive has a 50% chance of a problem.
How did this happen, well basically, the capacity of drives has gone up by 100x but reliability has not. The old 100GB drives have the same error rates as the new 10TB ones.
Net, net, what’s the solution?

SSD caches. well one big ones is to use SSDs because their error rates are much lower. Consumer SSDs are 10^16 and enterprise SSDs are 10^17 and they are much less dense at typically 1TB, but also more expensive. This means that you don’t read the hard disks much. In looking at our internal hard disk use, a RAID 1 1TB SSD (about $800 worth or thee same as a 10TB) has a low chance of failing. And you are not using the disks super much.
More RAID arrays. The other solution is to double and triple up on backup arrays and offline storage. You are basically adding more backups, so more have to fail. At home, that’s why we have a backup for each RAID array and then an offsite backup. This theoretically reduces the likelihood of failure by 9x (since all three systems have to fail to lose data).
Error correction above the file system with btrfs. For really vital files, you need to put error correction above the level of the file system. The biggest hole here is an easy system for recovering errors in JPEGS (which are dead with a one bit error). Most other files like movies do recover (since you get a new key frame) or Word documents where again there is redundancy. Btrfs while new does checksumming for data as well as metadata. This doesn’t protect you again disk rebuilds but does help when you get NBERs just reading disks (which can happen). It puts a CRC on files in essence.

Net, net in planning for using 10TB drives, here seems to be a good layout which balances the bit error problems with density for a DS243+ 12 drive array that can handle 10TB

Use two drives for 1TB SSDs. Use enterprise grade to 10^17 NBER, so for a 1TB drive, the odds that a rebuild will have a problem is very small (10^17 bits/10^12*8 = 10^5 or 1/10,000 chance of an error). For most of your data, you will only be using the SSDs which is good from a disk error rate point of view. For most systems, you can look at the SSD cache advisor, but it the use is reasonably local, this is going to have a 90% hit rate (in other words, it reduces the number of accesses to the hard drives by 10x). Using btrfs on top of this also adds a checksum on top so the chance of a normal running bit error is very low certainly.
For the remaining 10 drives (100TB of storage), you should format them as two RAID 6 arrays (that is 5 drives or 50TB with two parity drives) and the main thing to worry about is failure during rebuilds. This means that on a failure, you will have to read 40TB of data so will have a one in three chance of a read failure as noted above and then with the remaining 30TB, you have a ¼ chance so total is about 8% chance of a total array failure). This does mean you are committing in essence 40% of your drives to raid, but it reduces the chance of a problem on rebuild.
To further handle the issue, have a backup RAID array, so now the chance of total failure drops to 8% of 8% or 0.64%. Finally go with offsite storage (AWS advertises 0.999999 chance of data lose with their system, so you now are at 0.00064%).

The math is a little different for our older DS1211+ which is 8 drives with a maximum of 8TB per drive. In this design:

Use two drives with 1TB SSDs as a cache this reduces the number of bits read dramatically in the normal case as above and essentially prevents read issues.
With the remaining six drives you have 6×8=48TB of raw storage. At this level, if you have a single SHR2 or RAID6 partition, you will have on a single failure, the need to read in the worst case 40TB of data so this is the same error probabilities as the above or about 8% that an array will fail. The ratio of data is actually better in this case as you are using only ? of the drives rather than 40%.

So here is how to implement all of the above. Synology has a pretty confusing array of applications for dealing with files, there are a basically three levels, first there is the Storage Manager which is a top level application available in the upper right hand button, then in the control panel there is the File You use this to first create a Disk Group. Use this to allocate the disks properly and provide the basic RAID support. You will likely want to use SHR2 which is RAID6 but more flexible in that all the disks do not have to be identical. Then in the Control Panel there is the File Sharing applet that lets you create shares on top of the volumes and assign permissions.
When you get the thing, you don’t want to take the default, this will create a volume for you on the bare disks. This isn’t super flexible. Instead, you want to create a disk group first. Why? Because then you can move your volumes around or have them share space. As an example, suppose you have a volume for user data and another for say movies and music. Without a disk array, if you make a mistake you could end up with lots of space for user data but nothing for movies. By putting both volumes on a single disk array, they share the empty space.
The other reason is that the larger the array, the less you spend on redundant disks. So for instance with two 4 drive arrays, you would normally use SHR (this is a flexible version of RAID 5 or a single drive failure). So you would have two RAID 5 partitions, a double disk failure on any one array is catastrophic. If you use RAID 6 then you still use two drives for failure, but now you can tolerate two failures and you still have your data.
After you make our disk groups, you allocate volumes, unlike disk groups, don’t choose max, what you want to do is leave some unallocated free space. The best thing to do is to have different volumes (said for data or for music) and then they can share free space. If you have just one massive volume, then it is hard to allocate and move them around. I find you don’t want a hundred, but having 2-3 really big groups is good. For us, data, movies and tv shows are the big chunks at home.
The above means that in addition to a single array, you want a backup array as well and probably an offsite backup as well. That is what we use at home, we backup onto another array and then into the cloud. So even if one system has a 5% chance of failure, the odds are you will still have data.
SSD caching. If you have a workload which is pretty concentrate, then you should use the SSD Cache advisor to tell you what you should do. If you just want read caching, then you can use a single SSD, if you want read and write caching, then you want two SSDs in RAID 1. You need RAID 1 because if an SSD fails with a write, then you have corrupted the array. For our workloads, it shows that a 1TB read SSD is good for our media files whereas for our personal files a 256TB suffices. Synology has direct support for SSD caching and you set it up in the Volume manager.
Here is the full configuration step-by-step:

Insert a pair of SSDs into the system and your 10TB drives. you can use lower capacity drives, but the math above assures you that the failure rates will only better with lower density because the drive error rates seem to be independent of drive size. Use the Storage Manager SSD section to configure these for read/write caching.
Install all the hard drives, then in Storage Manager/Disk Group create two disk group each using five drives for the 12-bay system or a single group of six drives for the 8-bay system. Create these as SHR2.
Now create volumes in the Storage Manager/Volumes on top make sure to select btrfs as the file system type. Don’t completely allocate the volumes, leave some spare space so you can manage it. You probably want one or two big volumes so it is easy to manage them.
Now create public shares on top of volumes with Control Panel/Shared Folders. This is where you add permissions.

If you have existing volumes, you have to create enough space as disk groups and volumes once created can’t change their configuration. Fortunately, the File Station and a Copy To commands that you can right click and move things to temporary locations.
Once that is done, you can destroy the volumes and disk groups you don’t need and then add the extra disks to a Disk Group.