• @[email protected]
    link
    fedilink
    3112 months ago

    Thank you, I am fucking sick of people passing this comic around in relation to the Crowdstrike failure. Crowdstrike is a $90bn corporation, they’re not some little guy doing a thankless task. They had all the resources and expertise required to avoid this happening, they just didn’t give a shit. They want to move fast and break things, and that’s exactly what they did.

    • @OpenHammer6677
      link
      852 months ago

      Off topic but that “move fast and break things” line from Zuck irks me quite a bit. Probably because it’s such a bratty corporate billionaire thing to say

      • unalivejoy
        link
        fedilink
        English
        542 months ago

        It’s only ok to break things internally. Never push broken code to the customer.

      • @[email protected]
        link
        fedilink
        24
        edit-2
        2 months ago

        It works in most software because the cost of failure is cheap. It’s especially cheap if you can make that failure happen early in the development process. If anything, I think the industry should be leaning into this even harder. Iterate quickly and cause failures in the staging environment.

        This does not work out so well for things like cars, rockets, and medicine. And, yes, software that runs goddamn everything.

        • @Zron
          link
          122 months ago

          The problem is that this strategy is becoming more popular in physical product development, for things that we’ve known how to make for decades.

          You don’t need to move fast and break things when you’re making a car. We’ve been making cars on assembly lines for a hundred years, innovation is going to be small.

          Same thing for rockets. We put men on the moon 50 years ago for fucks sake. Rocketry is a well understood engineering field at this point. We know exactly how much force needs to exerted, we know exactly the stresses involved. You don’t need to rapidly iterate anything. Sit down, do the math, build the thing to spec, and it fucking works: see ULA, ESA, and NASA who have, all in the past few years, built rockets and had them successfully complete missions on the first launch without blowing up a bunch to “gather data”

          Move fast and break things is for companies that have crackhead leadership who can’t make up their mind about what a product should do. It should have no place in real world engineering, where you know what your product is going to be subject to.

        • @Dead_or_Alive
          link
          52 months ago

          “Looks at SpaceX”, Iterate quickly and break things can work for rockets, it just depends on the development phase and the type of project. I wouldn’t “iterate quickly” with manned, extra terrestrial or important cargo missions.

          But it can be used for the early development of rockets. Space X had a deep well of proven technology to draw upon during the development of the Falcon rocket. They put the tech together and iterated quickly to get a final product.

          Blue Origin as well as the Artemis program both use traditional techniques with similar proven technologies. I’d argue they aren’t as successful or were never intended to be successful (Artemis is just a jobs program for shuttle contractors at this point).

          • @trolololol
            link
            52 months ago

            Just ask NASA what they think about break things in unmanned vs manned programs.

            • @Zron
              link
              62 months ago

              Better yet, ask nasa, ULA, and ESA about how they needed to move fast and break things for their rockets that worked flawlessly on the first launch while actually fulfilling a mission.

        • @xantoxis
          link
          4
          edit-2
          2 months ago

          I understand what you’re saying about failing early. That’s a great strategy but it’s meant to apply to production software. As in, your product shouldn’t even start up if critical parts are missing or misconfigured. The software should be capable of testing its configuration and failing when anything is wrong, before it breaks anything else. During the development process, failing early also speeds up iteration cycles, but again, that’s only when it’s built into the sw runtime that it carries with it.

          “Fail early” can also mean your product stops working and shuts down as soon as its environment changes in a disruptive way; for example, if you’re using a database connection, and the database goes down, and you can’t recover or reconnect, you shut down. Or you go into read-only mode until your retries finally succeed. That’s a form of “fail early” where “early” means “as soon as possible after a problem arises”.

          You don’t want your development processes to move fast and break things. If your dev and staging environments are constantly broken because you moved fast and broke things, you will ship broken software. The more bugs there are in there due to your development practices, the more bugs you’ll ship, in a linear relationship.

          QA and controlled development iterations with good quality practices and good understanding by all team members is how you prevent these problems. You avoid shipping bugs by detecting failures early, not by making mistakes early.

      • Kalkaline
        link
        fedilink
        72 months ago

        That’s an easy thing to say when you haven’t laid off a ton of your workforce, might be careful operating like that the way tech has been cutting jobs lately.

    • @xantoxis
      link
      852 months ago

      They’re so far from being the little guy, their CEO has extensive experience DOING LITERALLY THIS SAME THING 14 YEARS AGO

    • @LesserAbe
      link
      27
      edit-2
      2 months ago

      You’re right people should have high expectations of crowd strike since it’s a well funded company, and they should provide better support to the random project with a single maintainer.

      That said, is there any indication crowd strike is a “move fast and break things” company? Sometimes people just fuck up, even if they don’t have a crazy ideology.

      • LazaroFilm
        link
        English
        39
        edit-2
        2 months ago

        You want proof they move fast and break things? They pushed an untested software update with auto update without rollout phases. How’s that for move fast? As for break things, well, do I need to explain?

        • @LesserAbe
          link
          102 months ago

          Not sure why you’re being aggro. I asked if this is part of their corporate identity. Zuckerberg went around literally advocating for that approach. Plenty of other companies are shitty without explicitly calling for that specific philosophy.

          • LazaroFilm
            link
            English
            52 months ago

            I just think that actions speak louder than words in this instance.

        • qazM
          link
          10
          edit-2
          2 months ago

          I think you mean without rollout phases.

          • LazaroFilm
            link
            English
            82 months ago

            Yes. Yes I meant that.

      • @[email protected]
        link
        fedilink
        22 months ago

        Q: We really appreciate everything you’ve shared. To finish up, what is one question you wish I’d asked and how would you have answered?

        A: I’ll give you the fun one, which is, we know racing as part of CrowdStrike. Why is that? What does all that mean? It’s a couple of things. One, it’s part of CrowdStrike. Many have probably seen us. If they’ve watched Formula One or Netflix, we’re big sponsors there and we’re pretty active in the US as well. And I think it’s been a great platform for us to gather like-minded customers together to spend some time talking about security in the industry and also understanding that, to your original comment, speed is critical for security. Speed is critical in racing as well. And if you could combine great technology like Formula One and CrowdStrike and speed together, that’s a winning proposition and the details matter, right? If you take care of the details, the little stuff takes care of the big stuff. And that’s just part of our DNA. I think it’s [speed] has served us really well.

        https://www.crowdstrike.com/blog/customers-conviction-speed-a-conversation-with-george-kurtz-ceo-and-co-founder-at-crowdstrike/

        • Live Your Lives
          link
          22 months ago

          I would assume he means speed in regards to catching malicious software and not speed of development, do you have a reason to think otherwise?

    • @marcos
      link
      242 months ago

      Besides, they are not even in the stack.

      They are just out, throwing shit at it.

    • @Zess
      link
      62 months ago

      Posting a “relevant” xkcd and acting like it’s clever is some people’s excuse for a personality.

      • @AdrianTheFrog
        link
        English
        42 months ago

        If everybody else is doing the same thing, yeah.

  • @[email protected]
    link
    fedilink
    1772 months ago

    Yep, its not the cave gremlin that codes clean and efficiently, using 1/10th of the amount of code lines, that fucks it up. Its the bloated commercial software vendors that break their software every week.

    • @itsnotits
      link
      372 months ago
      • it’s* not the cave gremlin
      • It’s* the bloated commercial software vendors
    • @[email protected]
      link
      fedilink
      Deutsch
      332 months ago

      …or it’s the gremlin who tries to get by, but only has like 30min a week for his project, since he has a day job and two gremlettes to feed.

      See the xz debacle.

      The underlying problem is, that there’s no monetary value being assigned to good software. As long as it’s good enough to sell it and buy insurance, that’s fine.

    • @TootSweet
      link
      English
      32 months ago

      This. It’s the mental illness known as “enterprise.”

  • @[email protected]
    link
    fedilink
    732 months ago

    If the Falcon driver was open source, someone might have actually caught the bug ahead of time.

    • magic_lobster_party
      link
      fedilink
      20
      edit-2
      2 months ago

      I doubt it. Few people are volunteering their time reading pull requests of random repos. It probably went fast from pull request to deployment, so there would be no time for anyone external to read.

      The only thing open source would’ve done is to give us a faster explanation of why it happened after the fact.

      • @[email protected]
        link
        fedilink
        342 months ago

        Considering this is a cybersecurity product that requires installing a kernel mode driver on mission-critical hardware, I guarantee at least a few people would have been interested in looking at the source if they had the opportunity. Or tried to convince their employers purchasing the software to pay for a third-party audit.

        The update that broke everything only pushed data, not code. The bug was extant in the software before the update, likely for years. Can I say for sure that a few extra eyes on the code would have found the problem ahead of time? No, of course not. But it couldn’t have hurt.

        • Tar_Alcaran
          link
          fedilink
          202 months ago

          The update that broke everything only pushed data, not code. The bug was extant in the software before the update, likely for years.

          A terrifyingly large number of critical issues come to light like this. The code has been broken since the first release, but nobody noticed until a certain edge-case occurs.

          • @[email protected]
            link
            fedilink
            152 months ago

            Exactly. Even worse, a bug like this isn’t just a crash waiting to happen. It’s also a massive security hole.

            If an incompetently written file can crash the system, there’s a decent chance that a maliciously crafted file could have allowed the complete takeover of it. Memory safety bugs, especially in kernel code, are very serious.

            A lack of validation would have been a glaring red flag to any outsider looking at the code, but it’s exactly the kind of thing someone who’s worked on the software forever could easily overlook. It’s so, so easy to lose sight of the forest for the trees.

      • @[email protected]
        link
        fedilink
        102 months ago

        Or during, and with open source it could have been possible for independent fixes to have been created as people figured out through trial and error. Additionally, something like this would have cost Crowdstrike a ton of trust, and we would see forks of their code to prevent this from happening again, and now have multiple options. As it stands, we have nothing but promises that something like this won’t happen again, and no control over it without abandoning the entire product.

        • magic_lobster_party
          link
          fedilink
          3
          edit-2
          2 months ago

          Even if a fix was discovered quickly it wouldn’t prevent the problem that it must be manually fixed on each computer. In this case a fix was discovered quickly even without access to source code.

          Just having more eyes on the source code won’t do much. To discover errors like these the program must be properly tested on actual devices. This part obviously went wrong on Crowdstrike’s side. Making the code open source won’t necessarily fix this. People aren’t going to voluntarily try every cutting edge patch on their devices before it goes live.

          I also doubt any of the forks would get much traction. IT departments aren’t going to jump to the next random fork, especially when the code has kernel access. If we can’t trust Crowdstrike, how can we trust new randos?

      • @schema
        link
        3
        edit-2
        2 months ago

        To my understanding, the driver was apparently attempting to process update files without verifying the content first (in this case a file containing all zeroes), so this issue would have likely been visible long before the catastrophe actually happened.

      • @[email protected]
        link
        fedilink
        52
        edit-2
        2 months ago

        Yes. Security through obscurity is an illusion. ClamAV is a well known and high performance open source AV solution.

        Edit: Here is the CWE entry on the topic in case anybody wants to read some more details as to how and why obscurity is not a valid approach to security.

      • @[email protected]
        link
        fedilink
        19
        edit-2
        2 months ago

        Strictly speaking, it’s not anti-virus software. It’s not designed to prevent malicious software from running or remove it. It’s just monitoring for behavior that looks malicious so it can notify the system administrator and they can take manual action.

        Most of the actual proprietary value, ironically enough, is in data files like the one that broke it. Those specify the patterns of behavior that the software is looking for. The software itself just reads those files and looks at the things they tell it to. But that’s where the bug was: in the code that reads the files.

        • Hildegarde
          link
          72 months ago

          I wouldn’t call it a bug.

          Any software running in kernel mode needs to be designed very carefully, because any error will crash the entire system.

          The software is risky because it needs to run in kernel mode to monitor the entire system, but it also needs to run unsigned code to be up to date with new threats as they are discovered.

          The software should have been designed to verify that the files are valid, before running them. Whatever sanity checks they might have done on the files, it clearly wasn’t thorough enough.

          From my reading, this wasn’t an unforeseeable bug, but a known risk that was not properly designed around.

      • @TootSweet
        link
        English
        82 months ago

        If the security of your algorithm depends on the algorithm itself being secret, then it’s not safe to distribute the software only in binary form either.

      • @[email protected]
        link
        fedilink
        32 months ago

        Not easily.

        Anti-virus companies–when they do it right–have tightly controlled air-gapped systems that they use to load viruses and test countermeasures. It takes a lot of staff to keep those systems maintained before we even talk about the programming involved, plus making sure some idiot doesn’t inadvertently connect those machines to the main building WiFi.

        There was at least one confirmed case of a virus spreading through speakers and microphones. What “air-gapped” means is pretty extreme.

        If it’s possible at all, it’d have to be through significant donations or public funding. A volunteer effort isn’t going to cut it.

        • @Avatar_of_Self
          link
          English
          3
          edit-2
          2 months ago

          Well it isn’t actually a confirmed case. Ruiu, the original person reporting the issue wasn’t sure exactly what the surface area of attack was at the start. Ruiu Dragos, who is a security researcher believed it infected via speakers.

          Eventually Errata CEO, Robert Graham, said that if he spent a year, he could build malware that did the same and that it was ‘really, really easy’

          Eventually, Ruiu noticed that the initial stage of infection was from one of his USB sticks.

          The speakers part comes in that he found that the packets transmitted between badBIOS infected machines stopped if he disconnected the internal speaker and microphone.

          Meaning, that sure, badBIOS may communicate data with each other via speakers but that it has never been proven that it could actually infect another machine via speakers. However, that hasn’t stopped articles from conflating things.

    • baltakatei
      link
      fedilink
      82 months ago

      Expected behavior: The file is not composed of null bytes.

      Actual behavior; The file is composed entirely of null bytes.