As the world recovers from the largest IT outage in history, it shows the danger of one point of failure in IT infrastructure

A global IT failure wreaked havoc on Friday, grounding flights and disrupting everything from hospitals to government agencies. Over all the chaos hung a question: how did a flawed update to Microsoft Windows software bring large swaths of society to a screeching halt?

The problem originated with an Austin, Texas-based cybersecurity firm called CrowdStrike, relied upon by most of the global technology industry, including Microsoft, for its Falcon program, which blocks the execution of malware and cyber-attacks. Falcon protects devices by securing access to a wide range of internal systems and automatically updating its defenses – a level of integration that means if Falcon falters, the computer is close behind. After CrowdStrike updated Falcon on Thursday night, Microsoft systems and Windows PCs were hit with a “blue screen of death” and rendered unusable as they were trapped in a recovery boot loop.

Microsoft is a juggernaut with significant market power, dominating cloud-computing infrastructure across Europe and the United States. So it wasn’t just computers that were affected, but servers and a host of other systems as well. Overwhelming requests from users, devices, services and businesses ushered in a cascading series of failures with Microsoft products – namely Azure Cloud and Microsoft 365. Failures plaguing Azure led to additional but separate disruptions with 365 services. A giant clusterfuck ensued.

  • @yggdar
    link
    24
    edit-2
    5 months ago

    Am I missing something? I thought the outage was caused by CrowdStrike and had nothing to do with Microsoft or Windows?

    • @pycorax
      link
      125 months ago

      The article actually talks about Azure which was using CrowdStrike internally so their point is valid but the headline is absolutely wrong. Azure is nowhere near a monopoly and it ends up implying that Windows, now Azure was the issue they’re describing.

    • Blaster M
      link
      English
      65 months ago

      This is the typical Guardian sensationalism. Gotta make it look like it was Microsoft’s fault, although this one is square on CrowdStrike’s head. Imagine if a security update for a remote administration tool caused an on-boot kernel panic on every linux server in the world…

    • @hangonasecond
      link
      55 months ago

      Microsoft’s use of CrowdStrike meant that a significant number of their cloud and SaaS offerings also failed, impacting users who likely didn’t know what CrowdStrike was.

    • @EtherWhack
      link
      -25 months ago

      Only systems running CloudStrike were affected, but all systems were Windows-based as that is the only OS it works with.

      I think it’s more touching on the vulnerability of infrastructure if a larger portion is run by only one OS. Something a lot of usb here may realize, but the general public has never really understood it. Where a scenario like this or similar can can cause a wide-spread blackout, all from a single bug; be it from popular software, or the OS itself.

      • @[email protected]
        link
        fedilink
        English
        8
        edit-2
        5 months ago

        That’s not correct. Crowdstrike does also work with Mac and Linux, but this particular incident only impacted the Windows sensor.

        They actually had a similar issue with the Linux sensor a couple of months ago, which… doesn’t speak well of their update process.

    • @TrickDacy
      link
      -25 months ago

      This is the extremely important akshually line anyway. Let’s all pretend that every OS is just as shitty because it lets us correct others on the Internet constantly

    • FenrirIII
      link
      -35 months ago

      Another hivemind circlejerk