There’s been some Friday night kernel drama on the Linux kernel mailing list… Linus Torvalds has expressed regrets for merging the Bcachefs file-system and an ensuing back-and-forth between the file-system maintainer.

  • @solrize
    link
    9520 days ago

    Can someone say why bcachefs is interesting? Btrfs I can sort of understand. I haven’t much kept track of most others.

    • @[email protected]
      link
      fedilink
      14720 days ago

      bcachefs is way more flexible than btrfs on multi-device filesystems. You can group storage devices together based on performance/capacity/whatever else, and then do funky things like assigning a group of SSDs as a write-through/write-back cache for a bigger array of HDDs. You can also configure a ton of properties for individual files or directories, including the cache+main storage group, amount of data replicas, compression type, and quite a bit more.

      So you could have two files in the same folder, one of them stored compressed on an array of HDDs in RAID10 and the other one stored on a different array of HDDs uncompressed in RAID5 with a write-back SSD cache, and wouldn’t have to fiddle around with multiple filesystems and bind mounts - everything can be configured by simply setting xattr values. You could even have a third file which is striped across both groups of HDDs without having to partition them up.

      • @NeoNachtwaechter
        link
        1620 days ago

        two files in the same folder, one of them stored compressed on an array of HDDs in RAID10 and the other one stored on a different array […]

        Now that’s what I call serious over-engineering.

        Who in the world wants to use that?

        And does that developer maybe have some spare time? /s

        • @[email protected]
          link
          fedilink
          6620 days ago

          This is actually a feature that enterprise SAN solutions have had for a while, being able choose your level of redundancy & performance at a file level is extremely useful for minimising downtime and not replicating ephemeral data.

          Most filesystem features are not for the average user who has their data replicated in a cloud service; they’re for businesses where this flexibility saves a lot of money.

          • @[email protected]
            link
            fedilink
            318 days ago

            I’ll also tac on that when you use cloud storage, what do you think your stuff is stored on at the end of the day? Sure as shit not Bcachefs yet, but it’s more likely than not on some netapp appliance for the same features that Bcachefs is developing.

        • Max-P
          link
          fedilink
          2419 days ago

          Simple example: my Steam library could be RAID0 and unencrypted but my backups I definitely want to be RAID1 and compressed, and encrypted for security. The media library doesn’t need encryption but maybe want it in RAID1 because ripping movies takes forever. I may also want to have the games on NVMe when I play them, and stored on the HDDs when I’m not playing them, and my VMs on the SATA SSD array as a performance middleground.

          • @NeoNachtwaechter
            link
            -719 days ago

            Steam library

            backups

            media library

            Wonderful.

            But these are libraries. Not single files.

        • Semperverus
          link
          English
          720 days ago

          This probably meets some extreme corporate usecase where they are serving millions of customers.

          • @[email protected]
            link
            fedilink
            18
            edit-2
            20 days ago

            It’s not that obscure - I had a use case a while back where I had multiple rocksdb instances running on the same machine and wanted each of them to store their WAL only on SSD storage with compression and have the main tables be stored uncompressed on an HDD array with write-through SSD cache (ideally using the same set of SSDs for cost). I eventually did it, but it required partitioning the SSDs in half, using one half for a bcache (not bcachefs) in front of the HDDs and then using the other half of the SSDs to create a compressed filesystem which I then created subdirectories on and bind mounted each into the corresponding rocksdb database.

            Yes, it works, but it’s also ugly as sin and the SSD allocation between the cache and the WAL storage is also fixed (I’d like to use as much space as possible for caching). This would be just a few simple commands using bcachefs, and would also be completely transparent once configured (no messing around with dozens of fstab entries or bind mounts).

            • @[email protected]
              link
              fedilink
              219 days ago

              Is there a reason for bind mounting and not just configuring the db to point at a different path?

          • @[email protected]
            link
            fedilink
            119 days ago

            I mean… If you have a ton of raw photos in one directory, you can enable the highest compression rate with zstd to it. Every other directory has lz4 with the fastest compression. Your pics take much less space, but the directory will be slower to read and write.

          • @NeoNachtwaechter
            link
            -1320 days ago

            My company serves millions, too. We don’t have such needs.

        • Leaflet
          link
          English
          32
          edit-2
          19 days ago

          But is GPL-compatible, unlike ZFS.

        • Max-P
          link
          fedilink
          819 days ago

          ZFS doesn’t support tiered storage at all. Bcachefs is capable of promoting and demoting files to faster but smaller or slower but larger storage. It’s not just a cache. On ZFS the only option is really multiple zpools. Like you can sort of do that with the persistent L2ARC now but TBs of L2ARC is super wasteful and your data has to fully fit the pool.

          Tiered storage is great for VMs and games and other large files. Play a game, promote to NVMe for fast loadings. Done playing, it gets moved to the HDDs.

          • @[email protected]
            link
            fedilink
            3
            edit-2
            19 days ago

            You’re misrepresenting L2ARC and it’s a silly comparison to claim to need TBs of L2ARC and then also say you’d copy the game to nvme just to play it on bcachefs. That’s what ARC does. RAM and SSD caching of the data in use with tiered heuristics.

            • Max-P
              link
              fedilink
              419 days ago

              I know, that was an example of why it doesn’t work on ZFS. That would be the closest you can get with regular ZFS, and as we both pointed out, it makes no sense, it doesn’t work. The L2ARC is a cache, you can’t store files in it.

              The whole point of bcachefs is tiering. You can give it a 4TB NVMe, a 4TB SATA SSD and a 8 GB HDD and get almost the whole 16 TB of usable space in one big filesystem. It’ll shuffle the files around for you to keep the hot data set on the fastest drive. You can pin the data to the storage medium that matches the performance needs of the workload. The roadmap claims they want to analyze usage pattern and automatically store the files on the slowest drive that doesn’t bottleneck the workload. The point is, unlike regular bcache or the ZFS ARC, it’s not just a cache, it’s also storage space available to the user.

              You wouldn’t copy the game to another drive yourself directly. You’d request the filesystem to promote it to the fast drive. It’s all the same filesystem, completely transparent.

                • @[email protected]
                  link
                  fedilink
                  2
                  edit-2
                  18 days ago

                  Brand new anything will not show up with amazing performance, because the primary focus is correctness and features secondary.

                  Premature optimisation could kill a project’s maintainability; wait a few years. Even then, despite Ken’s optimism I’m not certain we’ll see performance beating a good non-cow filesystem; XFS and EXT4 have been eeking out performance for many years.

    • @[email protected]
      link
      fedilink
      53
      edit-2
      19 days ago

      For me the reason was that I wanted encryption, raid1 and compression with a mainlined filesystem to my workstation. Btrfs doesn’t have encryption, so you need to do it with luks to an mdadm raid, and build btrfs on top of that. Luks on mdadm raid is known to be slow, and in general not a great idea.

      ZFS has raid levels, encryption and compression, but doesn’t have fsck. So you better have an UPS for your workstation for electric outages. If you do not unmount a ZFS volume cleanly, there’s a risk of data loss. ZFS also has a weird license, so you will never get it with mainline Linux kernel. And if you install the module separately, you’re not able to update to the latest kernel before ZFS supports it.

      Bcachefs has all of this. And it’s supposed to be faster than ZFS and btrfs. In a few years it can really be the golden Linux filesystem recommended for everybody. I sure hope Kent gets some more help and stops picking fights with Linus before that.

      • @calamityjanitor
        link
        2719 days ago

        ZFS doesn’t have fsck because it already does the equivalent during import, reads and scrubs. Since it’s CoW and transaction based, it can rollback to a good state after power loss. So not only does it automatically check and fix things, it’s less likely to have a problem from power loss in the first place. I’ve used it on a home NAS for 10 years, survived many power outages without a UPS. Of course things can go terribly wrong and you end up with an unrecoverable dataset, and a UPS isn’t a bad idea for any computer if you want reliability.

        Totally agree about mainline kernel inclusion, just makes everything easier and ZFS will always be a weird add-on in Linux.

      • @[email protected]
        link
        fedilink
        1419 days ago

        Btrfs doesn’t have encryption, so you need to do it with luks to an mdadm raid, and build btrfs on top of that. Luks on mdadm raid is known to be slow, and in general not a great idea.

        Why involve mdadm? You can use one btrfs filesystem on a pair of luks volumes with btrfs’s “raid1” (or dup) profile. Both volumes can decrypt with the same key.

      • @xantoxis
        link
        819 days ago

        Bcachefs has all of this. And it’s supposed to be faster than ZFS and btrfs. In a few years it can really be the golden Linux filesystem recommended for everybody

        ngl, the number of mainline Linux filesystems I’ve heard this about. ext2, ext3, btrfs, reiserfs, …

        tbh I don’t even know why I should care. I understand all the features you mentioned and why they would be good, but i don’t have them today, and I’m fine. Any problem extant in the current filesystems is a problem I’ve already solved, or I wouldn’t be using Linux. Maybe someday, the filesystem will make new installations 10% better, but rn I don’t care.

        • @[email protected]
          link
          fedilink
          719 days ago

          It’s a filesystem that supports all of these features (and in combination):

          • snapshotting
          • error correction
          • per-file or per-directory “transparently compress this”
          • per-file of per-directory “transparently back this up”

          If that is meaningless to you, that’s fine, but it sure as hell looks good to me. You can just stick with ext3 - it’s rock solid.

      • Possibly linux
        link
        fedilink
        English
        1
        edit-2
        16 days ago

        ZFS doesn’t have Linux fsck has it is its own thing. It instead has ZFS scrubbing which fixes corruption. Just make sure you have at least raid 1 as without a duplicate copy ZFS will have no way of fixing corruption which will cause it to scream at you.

        If you just need to get data off you can disable error checking. Just use it at your own risk.

        • @[email protected]
          link
          fedilink
          115 days ago

          But scrub is not fsck. It just goes through the checksums and corrects if needed. That’s why you need ECC ram so the checksums are always correct. If you get any other issues with the fs, like a power off when syncing a raidz2, there is a chance of an error that scrub cannot fix. Fsck does many other things to fix a filesystem…

          So basically a typical zfs installation is with UPS, and I would avoid using it on my laptop just because it kind of needs ECC ram and you should always unmount it cleanly.

          This is the spot where bcachefs comes into place. It will implement whatever we love about zfs, but also be kind of feasible for mobile devices. And its fsck is pretty good already, it even gets online checks in 6.11.

          Don’t get me wrong, my NAS has and will have zfs because it just works and I don’t usually need to touch it. The NAS sits next to UPS…

          • Possibly linux
            link
            fedilink
            English
            115 days ago

            I have never had an issue with ZFS as long as there is a redundant copy. A bad ram might cause an issue but that’s never happened to me. I did have a bad motherboard that corrupted data on write. ZFS threw its hands up but there wasn’t any unfixable corruption

            • @[email protected]
              link
              fedilink
              115 days ago

              Me neither, but the risk is there and well documented.

              The point was, ZFS is not great as your normal laptop/workstation filesystem. It kind of requires a certain setup, can be slow in certain kinds of workflows, expects disks of same size and is never available immediately for the latest kernel version. Nowadays you actually can add more disks to a pool, but for a very long time you needed to build a new one. Adding a larger disk to a pool will still not resize it, untill all the disks are replaced.

              It shines with steady and stable raid arrays, which are designed to a certain size and never touched after they are built. I would never use it in my workstation, and this is the point where bcachefs gets interesting.

      • @[email protected]
        link
        fedilink
        -519 days ago

        Encryption and compression don’t play well together though. You should consider that when storing sensitive files. That’s why it’s recommended to leave compression off in https because it weakens the encryption strength

        • @[email protected]
          link
          fedilink
          English
          519 days ago

          How does that work? Encryption should not care at all about the data that is being encrypted. It is all just bytes at the end of the day, should not matter if they are compressed or not.

          • @[email protected]
            link
            fedilink
            419 days ago

            Disabling compression in HTTPS is advised to prevent specific attacks, but this is not about compression weakening encryption directly. Instead, it’s about preventing scenarios where compression could be exploited to compromise security. The compression attack is used to leak information about the content of the encrypted data, and is specific to HTTP, probably because HTTP has a fixed or guessable structure.

            • @[email protected]
              link
              fedilink
              English
              4
              edit-2
              19 days ago

              Looks to be an exploit only possible because compression changes the length of the response and the data can be injected into the request and is reflected in the response. So an attacker can guess the secret byte by byte by observing a shorter response form the server.

              That seems like something not feasible to do to a storage device or anything that is encrypted at rest as it requires a server actively encrypting data the attacker has given it.

              We should be careful of seeing a problem in one very specific place and then trying to apply the same logic to everything broadly.

          • @[email protected]
            link
            fedilink
            English
            418 days ago

            There is also the BREACH which targets gzip/deflate compression on http as well. But also, don’t see how that affects disk encryption.

          • @[email protected]
            link
            fedilink
            218 days ago

            I can’t explain, perhaps due to my limited knowledge about the subject. I understood that compression was a weakening factor for encryption years ago when I heard about it. Always good to do your own research in the end 🙃

    • @[email protected]
      link
      fedilink
      English
      2520 days ago

      bcachefs is meant to be more reliable than btrfs - which has had issues with since it was released (especially in the early days). Though bcachefs has yet to be proven at scale that it can beat btrfs at that.

      Bcachefs also supports more features I believe - like encryption. No need for an extra layer below the filesystem to get the benefits of encryption. Much like compression that also happens on both btrfs and bcachefs.

      Btrfs also has issues with certain raid configurations, I don’t think it yet has support for raid 5/6 like setup and it has promised that for - um, well maybe a decade already? and I still have not heard any signs of it making any progress on that front. Though bcachefs also still has this on their wishlist - but I see more hope for them getting it before btrfs which seems to have given up on that feature.

      Bcachefs also claims to have a cleaner codebase than btrfs.

      Though bcachefs is still very new so we will see how true some of its claims will end up being. But if true it does seem like the more interesting filesystem overall.

    • @ikidd
      link
      English
      1319 days ago

      Also because it’s meant to be an enterprise level filesystem like ZFS, but without the licensing baggage. They share a lot of feature sets.

    • @[email protected]
      link
      fedilink
      1120 days ago

      In addition to the comment on the mentioned better hardware flexibility, I’ve seen really interesting features like defining compression & deduplication in a granular way, even to the point of having a compression algo when you first write data, and then a different more expensive one when your computer is idle.

    • bruhduh
      link
      119 days ago

      deleted by creator

    • Possibly linux
      link
      fedilink
      English
      116 days ago

      Btrfs has architectural issues that can not be fixed. It is fine for smaller raid 0/1 but as soon as you try to scale it up you run into performance issues. This is because of how it was designed.

      Bcachefs is like btrfs and has all the features btrfs does. However, it also is likely to be much faster. Additionally it has some extra features like tiered storage which allows you to have different storage mediums.