I am mainly hosting Jellyfin, Nextcloud, and Audiobookself. The files for these services are currently stored on a 2TB HDD and I don’t want to lose them in case of a drive failure. I bought two 12TB HDDs because 2TB got tight and I thought I could add redundancy to my system, to prevent data loss due to a drive failure. I thought I would go with a RAID 2 (or another form of RAID?), but everyone on the internet says that RAID is not a backup. I am not sure if I need a backup. I just want to avoid losing my files when the disk fails.
How should I proceed? Should I use RAID2, or rsync the files every, let’s say, week? I don’t want to have another machine, so I would hook up the rsync target drive to the same machine as the rsync host drive! Rsyncing the files seems to be very cumbersome (also when using a cron job).
Everyone repeat with me: RAID IS NOT BACKUP.
Never was, never will be.
RAID is redundancy, not backup. The main purpose is to keep your system available while you deal with certain, specific types of failures. Also, for all intents and purposes, RAID2 isn’t a thing. I suspect you were reading about RAIDZ, RAID using ZFS. While it has proponents and advantages, it won’t secure your data any more than the common RAID5/6.
Backup is to make sure you don’t lose data, regardless of what happened. This includes hardware failures, user error, bad/malicious software, and more.
If your data is important to you, setup a backup. If you need 100% uptime, setup a backup, then setup RAID.
everyone on the internet says that RAID is not a backup
Because it is not.
I just want to avoid losing my files when the disk fails.
Backups. Preferably multiple. At least one of them off site.
I would hook up the rsync target drive to the same machine as the rsync host drive
And lose both in case there’s a power supply failure, voltage spike on the grid, water spill or something else. Plenty of options which will fry the whole system.
RAID is for maintaining availability and reducing downtime in the event of a drive failure.
Take a look at restic for backup.
Raid can protect you from a single drive failure in case you need an “always on” setup. Even then, if the drives are identical, they can fail within days from each other. If you don’t have monitoring, you’ll lose everything before you can react. I feel that’s not your use case.
You need backup. You can use something like rsync or even better borg backup. Keep the backup offline and backup often. You’ll be safer that way.
Would also need hot plug drive support with an easy way to swap the drive for 100% online recovery, which consumer gear often doesnt have.
Thanks for the advice. Do you have suggestions how to setup/handle the backup? E.G. manually connecting the drive via USB and cloning the files via rsync/ borg, e.g. every week or every time a threshold of changes have been made? Or having a small extra machine with the backup hard drive and sending the files via the network?
I am also still a bit confused. I have 2x 12TB. Lets say I have 6TB files on my hosting drive. AFAICT can I have two backups/snapshots before the third backup needs to override the first backup. Or am missing something? Buying more drives for backup is not really doable, as drives do generally cost a buck and I cannot/ don’t really want to afford buying more drives.
You can backup to an external USB drive (that’s what I do), or setup a small backup server (with RAID if you want).
If you use Borg it will do the right thing out of the box with minimal configuration - compression, deduplication, encryption, and incremental backups.
The first backup will be full and take longer, but subsequent backups will only target changes and will be quite fast.
Restoring is very straightforward, even if you only need a single file you deleted accidentally.
Thanks, I will look into that!
I think it would be best to just delete the files, so you can get used to losing them. Maybe set a cronjob to delete them regularly.
So for me personally, Jellyfin isn’t worth actually paying to back up all 16+TB, but my Nextcloud absolutely is. I do Restic for the data I want with a pretty long retention and have had great luck with restoring off of it
So the minimum I would do is set up a backup to the drive using Restic or something similar. Make sure it runs at least daily, and keep snapshots for a month or more.
Make sure the backup drive can only be accessed by the Restic user account, and not any of the other service accounts to minimize chances of a misconfiguration or something damaging the backup data.
Obviously this will not protect against physical damage like a lightning strike or major power surge, or malware or config errors big enough to wipe out the whole system and all the drives attached to it.
So for any more important data that isn’t replaceable, it needs to be backed up online as well to a service like Backblaze B2 using Restic.
I’m not an expert but I’d mix a RAID with a backup. And the RAID could be a 10 or a 01, but better read about all the types and choose the one that better suits your need.
2 disks in the same machine is not a backup whether the data is copied between them using RAID or rsync or anything else.
Sounds like for this machine, just use the two disks in RAID1, or a ZFS mirror, or something. And figure out something else for backups. Probably a cloud solution.
Also, RAID2 requires a minimum of 3 disks, and is rarely used.
I’d argue it is a backup as long as something is doing snapshots of some kind to the other disk, and not realtime sync like raid. Obviously that should not be your only backup though.
It’s been literally a couple decades now but I once had to troubleshoot multiple RAID failures in a number of identical servers that were all running 6 disk RAID-5. Long story short the power supplies in each server was slowly losing its ability to power all the drives at the same time, so random drives started throwing errors. By the time we figured out the root cause, most of the drives had generated enough errors that the RAID controller couldn’t rebuild the volumes.
So, no, as others have said RAID is not a backup and should never be treated as such. A single point of failure like the power supply can easily cause the loss of the entire volume without warning.
It’s a ‘hot copy’ or just ‘copy’ if you rsync/whatever the files. And they’ll be gone too if the whole system fails due to power supply faulting, thunderstorm hitting the lines, misplaced coffee cup falling over, dropping the whole machine and so on…
If you make backups/snapshots they’re not the same as just a copy, still useful for recovering from accidental deletion of files or something like that. Obviously should not be your only backup though.
My most common use of the local backups for my house is someone needs a file they deleted by accident or an older version of a file.
Yes, I get that. They can be very useful, specially if you share a NAS with family or something similar. At work the most common request for backup recovery is a user error, with a huge margin, so I guess you could call a separate copy a backup too, but like you said, it should not be the only copy. I’m personally a bit hesitant to call that a backup at all, but you do you, I’m not going to debate what qualifies.
3-2-1 is obviously the best approach, but (in my opinion) for the majority 2-1-1 (two copies, on a hard drives with one copy offsite) is enough, even if you run a small business, as long as the offisite copy is incremental, so that you can revert to an earlier date and mitigate ransomware as well as a user error which isn’t immediately noticed.
In any case, the only fact I can rely with ~20 years of experience in the business is that hardware breaks. The only question is ‘when’, not ‘if’. And no matter if you’re a home gamer or a system architect for Meta, you need to plan how to mitigate that risk. Running everything on a single location with two separate hardware is better than having only one mainboard and from that you can mix and match whatever you want, limiting factors (mostly) being your time and wallet.
Ok so of course the best solution is both raid and offsite backups. After that the question is how much do you want to prioritize convenience vs not losing you data at all costs. For irreplaceable data that you care deeply about not losing make an offsite backup it’s not worth the risk, never rely on raid to keep your data safe it may not always work. For data that is replaceable and you don’t really want to lose temporary access to in the event of a drive failure then raid is fine it most likely won’t fail.
If you are making a backup please make it offsite if you can it just provides one massive extra layer of security that makes it much harder for any permanent data loss to occur. It doesn’t have to be in a NAS it can just be a hard drive by itself in a friend or relatives house, Whenever you come over and visit update the drive for any changes. One downside to this method is pretty obvious, in the event of your drive at home failing the data that you have backed up will be as old as the last time you backed it up, so if you are constantly creating new data that you can’t risk losing then this won’t work. And this goes without saying you should really have all of your data encrypted but especially your offsite backups I wouldn’t trust anyone with my data.
A good compromise and probably what I would do is use raid all your files and then use your 2tb hard drive to backup all of your really important data that you don’t want to lose. With this setup it’s possible to lose some data in the event that your raid setup doesn’t work for whatever reason or your house burns down but at least all of your important data is completely safe.