As the world recovers from the largest IT outage in history, it shows the danger of one point of failure in IT infrastructure

A global IT failure wreaked havoc on Friday, grounding flights and disrupting everything from hospitals to government agencies. Over all the chaos hung a question: how did a flawed update to Microsoft Windows software bring large swaths of society to a screeching halt?

The problem originated with an Austin, Texas-based cybersecurity firm called CrowdStrike, relied upon by most of the global technology industry, including Microsoft, for its Falcon program, which blocks the execution of malware and cyber-attacks. Falcon protects devices by securing access to a wide range of internal systems and automatically updating its defenses – a level of integration that means if Falcon falters, the computer is close behind. After CrowdStrike updated Falcon on Thursday night, Microsoft systems and Windows PCs were hit with a “blue screen of death” and rendered unusable as they were trapped in a recovery boot loop.

Microsoft is a juggernaut with significant market power, dominating cloud-computing infrastructure across Europe and the United States. So it wasn’t just computers that were affected, but servers and a host of other systems as well. Overwhelming requests from users, devices, services and businesses ushered in a cascading series of failures with Microsoft products – namely Azure Cloud and Microsoft 365. Failures plaguing Azure led to additional but separate disruptions with 365 services. A giant clusterfuck ensued.

  • @sandalbucket
    link
    84 months ago

    Crowdstrike is big, but not that big.

    About half of my clients use them; and of those, about a third are halfway through ripping them out in favor of MS defender.

    (MS is definitely “that big”)