I just need to preserve some old data that I have on my computers, so I was wondering what would be the best way to archive stuff long term.

Blu-ray disks ? Multiple HDDs ? What do you guys suggest ?

  • @nottelling
    link
    English
    34
    edit-2
    1 year ago

    Self hosting principals aside, is this data actually important? If so, then don’t fuck around with self hosting it. Are you looking for lowest cost? Then don’t waste a bunch of money spinning your own disks.

    Amazon glacier to guarantee availability and your own encryption to guarantee privacy.

    It’s currently running me about $4/month for around 10tb that I don’t want to lose but just don’t want to deal with. An equivalent HDD solution would be around $500, that’s 10 years to break even assuming zero disk failures and zero personal maintenance time.

    Plus it’s guaranteed. Inherent multiple copies, has SLA, and there’s no worry about the service just disappearing. It’s they decide to shut down or raise prices or whatever, you can reevaluate and move.

    Edit: Glacier and similar services are meant for archival which is the term OP used. You never expect to need it again, but can’t get rid of it. Retrieval cost is mostly irrelevant, but yes much more expensive. (I’d wager still less expensive than a home RAID array.)

    • Dran
      link
      8
      edit-2
      1 year ago

      What would it cost to retrieve though? You probably still have the appropriate cost-effective solution but it’s an important consideration for newcomers to have complete math.

      • @[email protected]
        link
        fedilink
        3
        edit-2
        1 year ago

        Retrieving from S3 glacier is approximately 10 times the monthly cost of storing the data 100 times actually. Didn’t realize retrieval from Glacier isn’t actually downloading it onto your local, but rather just moving it into a frequent access tier S3 bucket from which you can then download, and this download is the expensive part.

      • @[email protected]
        link
        fedilink
        21 year ago

        Yeah AWS charges for outgoing data, but not incoming. Keeping that data there is cheap, getting it back would not be.

      • @nottelling
        link
        English
        21 year ago

        OP said “archive”, not “backup”. Glacier is for days you need to keep but rarely touch.

    • @[email protected]
      link
      fedilink
      41 year ago

      How is it to get the data back?
      Can I do it in real time so I could mount it as a media storage or would I need to rent one of the faster S3 tiers?

      • @nottelling
        link
        English
        01 year ago

        I think you can technically do it, but it’s expensive to retrieve. But that isn’t the point of an archive.

    • So instead of “fucking around” with putting it on a long lasting storage device to keep in a wardrobe, he should give up control of the data, hand it to a company and risk forgetting to inform them about an adress change, so everything is lost, when the bills arent paid?

      How is that more secure?

      • @nottelling
        link
        English
        11 year ago

        Guess it depends on how much you trust that Amazon is going to steal your data instead of doing the thing you’re paying them for, vs a house fire or media failure or whatever.

        There’s also pretty clear rules about unpaid bills, the data doesn’t just vaporize.

        This is what we call a “risk assessment”, and imo if I must have that data available long-term, then a single copy on DVDs in a closet isn’t good enough.

        • I’d argue for most consumer use cases having one or better two physical back ups is more reliable, because it is simple and straightforward. Also the risk mitigation is already in place, as you wouldn’t want your place to burn down either way.

      • @nottelling
        link
        English
        21 year ago

        Us-East. Look specifically at glacier, which is long term, near free to store, expensive to remove.

        • @[email protected]
          link
          fedilink
          11 year ago

          Is it Glacier Deep Archive? I just realized I was looking at the Glacier flexible retrieval prices earlier. US-East lists it as $0.00099/GB (about $1/TB), which is still higher than what you’re getting.

          • @nottelling
            link
            English
            21 year ago

            Last months bill for my entire Amazon account was $4.72. most of that was the glacier storage.

    • @Zippy
      link
      11 year ago

      Keep in mind that charge $0.01/GB. $10/TB. That would cost $100 up front plus the $4 per month if I am not mistaken.

      • @nottelling
        link
        English
        11 year ago

        Nope. Incoming data to S3 is free. Egress is expensive, but OP said “archive” not “backup”.

        • @Zippy
          link
          11 year ago

          I read as incoming but certainly not an expert and that might be old rates or something.

    • Atemu
      link
      fedilink
      11 year ago

      It’s they decide to shut down or raise prices or whatever, you can reevaluate and move.

      Move at how many hundred $ per TB?

      • @nottelling
        link
        English
        11 year ago

        Still less than an equivalent RAID array. Particularly if you consider that archives are very rarely extracted as a complete bulk, vs pulling the specific records needed.

  • @Donebrach
    link
    121 year ago

    encode your data into the dna of an alien world so that it lasts for all time.

  • @9point6
    link
    8
    edit-2
    1 year ago

    Depends how important the data is, how long is long term and how budget is budget, but assuming you don’t want to risk losing anything, backup best practice is the 321 rule

    3 copies 2 different media 1 off-site

    I’d almost always say a cloud provider for your off-site backup, but if you don’t want to do that, it depends how much you want to spend.

    There’s no guaranteed do-it-once-and-you’re-done approach here, as all data can degrade. For instance if one of your backup media is hard disks, you’re probably going to want it setup in at least RAID 5 and you want to be on top of swapping out disks when they fail. If you’re thinking of the Blu-ray or tape approach, you’re going to want to periodically check that the media hasn’t degraded. You’ll probably also want to plan to replace the backup media every half decade or so to be extra safe (e.g. BD-Rs have a lifespan of 5-10 years).

  • @foggy
    link
    8
    edit-2
    1 year ago

    Buy used HDDs and configure RAID arrays.

    You can get like 32TB of cheap basically ready to fail HDDs and listen to them click away for dirt cheap. I mean like a couple hundred bucks.

    Buy some old tower PCs from a school or something. Low specs are fine.

    Install Ubuntu server, set up samba and minidlna set up tunnel with cloudflare. Boom.

    You could set this up for like $500. I have a setup like this. My HDD has been clicking since install. 2 years strong. I have two backup 16TB HDDs ready to hot swap should either of them fail. Having those backups on hand brings your cost up to about $800, but again, this is for two 16TB HDDs in a RAID config. If you did like 8TB instead, this is all probably $500 with backups.

    Western Digital has a bargain bin.

    P.S. y’all have valid concerns about RAID5, I’m no expert so I mirror the whole of it to another 16tb USB HDD I got for anfew hundred bucks.

    It’s basically everything but flood/fire proof. The odds of my RAID config failing such that It is irrecoverable is low, but the odds my USB HDD fails at the same time is inconceivable! This USB HDD is attached to my main workstation hub, when I log in to windows (shut up I dual boot for development ya dinguses) and my machine is idle for 20 mins, it performs a sync between my network area storage (my ghetto RAID server) and the USB HDD.

    It’s low cost and fool proof, and kinda beefy. If I upgrade the tower to something modern…? Add 1TB of SSD to everything? Dude… I’ll be rocking a rad setup for under 1500.

      • @[email protected]
        link
        fedilink
        51 year ago

        I once lost a RAID6 to a faulty power distributor in a server cause (lost 5 out of 12 disks). RAID is not a backup.

        • @[email protected]
          link
          fedilink
          21 year ago

          But 1 disk failing and the array braking aint either.
          This is about real time data not backup which should at best happen daily or bi-daily for really important data.

        • @Zippy
          link
          11 year ago

          Not a backup but nearly as good. All your data is located in one place which could result in a weird failure like you experienced it a fire/theft.

          That being said, in you case I can not imagine the platters being damaged. There should be ways to recover.

      • Bonehead
        link
        fedilink
        31 year ago

        After my experience with raid5 and the WD Green 2TB drives that were so fragile that the vibrations of 6 drives in the same case is enough to kill them resulting in 2 drives dying at same time wiping out my entire media collection…yeah, use raid6, with another server holding a raid6 array as continuous backup.

    • million
      link
      English
      11 year ago

      If you are ready to do some reading I’d recommend ZFS over traditional raid. ZFS makes more guarantees then traditional about file integrity over time.

    • Giddy
      link
      fedilink
      English
      11 year ago

      Do you have a link to WD or other sites selling old HDD’s?

  • @KISSmyOS
    link
    6
    edit-2
    10 months ago

    deleted by creator

  • Extras
    link
    fedilink
    5
    edit-2
    1 year ago

    Would probably help to know for how long, how much capacity do you need and what budget. Should also be stated external factors play a massive factor on how long a storage device can survive like enviroment, humidity and heat being the biggies

    Edit in case I fall asleep: for the budget I usually would go with an external ssd just refresh the data every year or 2 it should be ok for 8ish years maybe even 10. For a write it and forget it method you’ll want m-disc instead which are more expensive but if properly stored will last lifetimes so the failure point will be a usuable drive that can read it. If you decide to go the spinning mechanical drive route make sure to buy 2 (a backup for the backup) since they are a lot more fragile. Gold plated dvds/cds are also another write and forget option but have less capacity than m-discs

  • @nomecks
    link
    51 year ago

    If you have enough data: Tapes. Tapes are so hilariously cheap to keep. Write them and keep them in a fire proof box. No power needed to keep platters spinning. 45TB/tape!

    • @[email protected]
      link
      fedilink
      11 year ago

      Personally I consider a tape only a valid solution in the 100+TB range. (At least cost wise) Unless you happen to have a tape drive already at your disposal…

    • GunpachiOP
      link
      fedilink
      11 year ago

      I have never used tapes, but I want to use it if it’s viable. I only have about 3TB of data currently.

      • Atemu
        link
        fedilink
        11 year ago

        Tapes only make financial sense if you’re in the hundreds of TB.

  • Monkey With A Shell
    link
    fedilink
    41 year ago

    A couple different threat models to consider, hardware failure vs human failure. Things like RAID can effectively cover the hardware failure side and be fully transparent. Human failure is a bit more tricky. There are a number of old expressions about backups but one that’s good to keep in mind is snapshots are not backups. They’re convenient and easy to automate but if the system making them goes kerplooie they’re pretty useless.

    A tiered version is good for off device backup, using diff backups routinely to only copy the new or changed data with a periodic full backup.

    Cold disks are great but make sure to test them periodically, nothing worse than looking to restore a chunk of data only to find the backup can’t be read.

    • Atemu
      link
      fedilink
      11 year ago

      Things like RAID can effectively cover the hardware failure side

      Note that RAID only covers one specific hardware failure. To the point where IMHO, you cannot consider it a data security measure, only a data availability one.

      • Monkey With A Shell
        link
        fedilink
        11 year ago

        Curious what you mean here. Aside from RAID0 all tiers allow for at least one disk to fail without loss. If the whole raid controller fails you can typically replace that independently and import the foreign config. This is all talking about hardware backed RAID of course, not a soft-raid config.

        • Atemu
          link
          fedilink
          21 year ago

          There are much worse ways for a RAID controller to fail than suddenly not doing anything. What if it doesn’t notice it has failed and continues to write to a subset of devices only? Great recipe for data corruption right there.

          Bad RAID controller/HBA, CPU, RAM, Motherboard, PSU are all hardware failures that RAID does very little (if anything) to mitigate. One localised incident in any of them out could make all of your drives turn into magic smoke or bits go bad.

          You cannot rely on that sort of setup for data security. It only really mitigates one relatively common hardware to push storage system uptime above 99.9%. That has a place in some scenarios where storage “only” being 99.9% available has a significant impact on total availability but you’d first have to demonstrate that that is the case.

          • Monkey With A Shell
            link
            fedilink
            2
            edit-2
            1 year ago

            Fair enough if using a more expansive version of hardware failure. Things like a house fire would presumably destroy a series of optical disks which would make most any in house option non-functional. Network based backups could also fail to transmit data securely and accurately as well so really any sort of replication solution needs validation of the data is of significant value. A first step in preservation is to not have the box that it came from burn down, and have a way to recover if someone does a ‘sudo rm -rf /’ accidentally.

            • Atemu
              link
              fedilink
              21 year ago

              Things like a house fire would presumably destroy a series of optical disks which would make most any in house option non-functional.

              Well, it makes any option that only uses a single location non-functional. Having two copies at home and one at a distant location (as recommended by the 3-2-1 backup rule of thumb) mitigates this issue.

              Network based backups could also fail to transmit data securely and accurately as well

              Absolutely. Though the network is usually assumed to be unreliable from the get-go, so mitigations usually already exist here (E2EE, checksums, ECC).

              really any sort of replication solution needs validation of the data is of significant value

              Absolutely correct. An untested backup is probably better than nothing but most definitely worse than a tested backup.

              and have a way to recover if someone does a ‘sudo rm -rf /’ accidentally.

              Certainly something that must be mitigated but this is getting out of “hardware failure” territory now ;)

  • @calypsopub
    link
    31 year ago

    Multiple methods, not really important which ones. I use an external hard drive plus I email zip files to myself.

  • @[email protected]
    link
    fedilink
    31 year ago

    CDs degrade over time and so aren’t the best way to archive data if you know you will need it again. If it’s just an ‘in case’ then it may be ok. Best bet is to buy a USB disk and then keep a second copy of it offsite. Also best practice to not use two of the same manufacturer drive.

  • Atemu
    link
    fedilink
    1
    edit-2
    1 year ago

    I use multiple offline HDDs with a policy to keep n copies between them because it’s by far the cheapest way to still own the data. It requires regular checks because HDDs are likely to fail after a decade or so and a bunch of HDDs are a pain to manage, so you will need tooling for this. I use git-annex for this purpose but it’s not particularly user-friendly.

  • Dyskolos
    link
    fedilink
    11 year ago

    I don’t know about your budget, but I’d do it onto HDDs. They’re cheap and large. But use two and regularly check them like every other month or so. If one breaks, get a replacement. That’s the most simple (if you have an external dock) and cheap solution that you OWN.

    Also copy it at least twice onto each in case of corruption. Also use a copier that verifies (fastcopy, teracopy etc)

    • @riquisimo
      link
      41 year ago

      Are those… cheap? They don’t look cheap.

      • @foofiepie
        link
        21 year ago

        About £90 for 4 discs, and £30 for the writer, gets you 400Gb storage.

        • @MrsDoyle
          link
          11 year ago

          I bought a (nearly) 1Tb thumb drive for £10 last week and copied all my music onto it. I’m thinking of getting a few more for docs and other media and leaving them in my keysafe.

    • @[email protected]
      link
      fedilink
      31 year ago

      Generally speaking obscure formats are not great for long term storage for your chances to read it again years later.

      • @afunkysongaday
        link
        21 year ago

        You can read those with a regular blu ray drive. No special hard- or software needed.

        • @[email protected]
          link
          fedilink
          01 year ago

          By the time blu ray started becoming popular optical media were basically dead already as a data storage medium so those aren’t particular common either.

          • Atemu
            link
            fedilink
            11 year ago

            The only reason blu ray still exists is that you can’t buy (as in: own) movies in a high quality format otherwise.

            If the publishers got the sticks out of their arses and offered file downloads for purchase, I wouldn’t see a single reason to buy a physical disk other than sentimentalism.

    • GunpachiOP
      link
      fedilink
      101 year ago

      I appreciate your honest answer. I want to completely own my data, so I would not go the Cloud route. After all the Cloud is basically someone else’s computer.

      • @nottelling
        link
        English
        51 year ago

        The data remains yours if you encrypt it. Someone else’s computer saves you all the time and effort of maintaining and monitoring hardware.

        You want to use the actual services meant for this. S3 or glacier or something, not just consumer cloud storage like Google drive or Dropbox.

      • ares35
        link
        fedilink
        11 year ago

        there are many ways to encrypt locally and store the encrypted data remotely; either a container (like veracrypt), or individual files with a file-based encryption schemes (such as cryptomator) or one of numerous backup or sync utilities with built-in encryption.

        • ShadowRam
          link
          fedilink
          31 year ago

          Ask all those that had shit on Megaupload in 2012.

          Encrypted or not. Still lost.

          • @I_Miss_Daniel
            link
            41 year ago

            Yeah. It should not be your only backup, but it can be one of them.

          • @nottelling
            link
            English
            01 year ago

            Lol imagine ever having considered megaupload as your backup solution.

    • @KillerTofu
      link
      71 year ago

      Blasphemy in the hallowed halls of FOSS.

    • @[email protected]
      link
      fedilink
      4
      edit-2
      1 year ago

      Yes, that may be an option… except that google can irreversibly lock you out of your account, or they can delete your files if their content scanning think it goes against some of their terms, but also simply there are people who don’t want to lose their privacy to google.

      • @[email protected]
        link
        fedilink
        21 year ago

        It’s more likely that a Google data center exists in 100 years than your house. If you have a personal aversion to it then I can understand - but, realistically, it’s more likely that an offsite copy on Google Drive exists in 2123 than a random piece of furniture you own - and furniture is pretty damage resistant.

        • @Zippy
          link
          11 year ago

          I can’t imagine Google drive just shutting down without a great deal of notice either.

        • @[email protected]
          link
          fedilink
          0
          edit-2
          1 year ago

          Please read my comment again. My concerns are not about google drive shutting down. These have happened to real people.

        • Frater Mus
          link
          fedilink
          English
          01 year ago

          It’s more likely that a Google data center exists in 100 years than your house.

          Yes, but it’s more likely that Google will have killed a particular service like Drive. Cf. Google Reader, Hangouts, Data Saver extension, Buzz, etc.

          google graveyard

          • @Zippy
            link
            11 year ago

            I suspect though you would get quite a bit of notice before they kill simmering.

    • Extras
      link
      fedilink
      21 year ago

      Good for a 3,2,1 backup method but bad for archival. We don’t know if google will even exist in whatever number of years OP wants to archieve for or if the data will be deleted/modified by google themeselves due to some crap policy like their 2 year inactive account one for example. Just too many factors that will be out of OP’s control