• @breakingcups
    link
    English
    1711 month ago

    Let’s be clear, this isn’t the single programmer’s fault. Everybody will eventually make a mistake. The fact that it wasn’t caught by mitigating measures such as reviews, tests, and audits is the real error we can learn from here.

    • @rtxn
      link
      English
      881 month ago

      A Proton-M booster carrying a GLONASS satellite crashed shortly after takeoff at Baikonur in 2013. The failure was caused by a gyroscope package that had been installed upside down. The receptacle had a metal indexing pin that should’ve prevented the incorrect installation. The worker simply pushed so hard that it bent out of the way.

      When you make a foolproof design, God makes a better fool.

        • @rtxn
          link
          English
          19
          edit-2
          1 month ago

          Ah yes, it’s on the internet, so it must be American.

          • Kosmodrom Baikonur (located in Kazakhstan) is the primary launch site of Roskosmos (Russia)
          • The Proton is a Soviet-made heavy launch rocket, still used today (not related to Rocket Lab’s Electron and Neutron families (which are also not American))
          • GLONASS is the Soviet/Russian equivalent of the GPS

          I think it’s safe to say that the guy did not land a job at NASA.

          • @[email protected]
            link
            fedilink
            English
            11 month ago

            Didn’t nasa make the same mistake ? Because I remember that they put arrows on the slots because someone put a sensor upside down.

            • @rtxn
              link
              English
              21 month ago

              I can’t recall anything like that. The only other crash I remember that was caused by a sensor was the Schiaparelli lander, and it was an ESA mission.

              • @[email protected]
                link
                fedilink
                English
                1
                edit-2
                1 month ago

                I remember it from a youtube video from one of those engineering channels (might have been “real engineering”) probably a year ago. I only remember it because I thought “wow they have to have so many safeties” and that it is good to draw on parts and such instead of just relying on technical drawings.

                I don’t remember, but it might not have crashed (multiple sensors), and it might not have had a latch/notch. But it was a long time ago.

                Edit: I still remember the big yellow arrow.

        • @[email protected]
          link
          fedilink
          English
          10
          edit-2
          1 month ago

          I know a story about a certain fighter jet we built in the United States. Programmers for the radar had everything set and they ran the tests over and over and the radar was fucking up. Don’t want to put in to many details but end result was about $100m dollars in research losses to find out the mechanic who installed the antenna on the front of the fighter turned it a quarter turn to far and it must have stripped the threads and bent the antenna slightly. Took over a month for them to catch it. They just kept assuming the programming was wrong because the antenna looked right to the eye from as close as the standard person got

        • Fuck spez
          link
          fedilink
          English
          41 month ago

          Probably by being qualified, and also by being a human being who sometimes makes mistakes and had a bad day.

    • @DontRedditMyLemmy
      link
      English
      371 month ago

      I think it was a different era, to borrow an awful phrase. In 1962 they were still figuring out best practices for reviews, tests, and audits. Even today, lone hero outputs can get pretty far when processes aren’t follow.

    • @[email protected]
      link
      fedilink
      English
      181 month ago

      Which they did learn from!
      I guarantee every mistake like this at any good company leads to a leap forward in tooling for simulation, testing, code building, review, merging, local dev environments etc.
      The good companies share their work (via open sourcing their solution, blogging their learnings) or by contribute to existing solutions.
      NASA’s ROI cannot be measured. The amount of industries their R&D has touched is massive

    • AwkwardLookMonkeyPuppet
      link
      English
      41 month ago

      But did leadership recognize that, or did the programmer catch the blame?

    • partial_accumen
      link
      English
      261 month ago

      Mars Climate orbiter holds the record I think for coding problem and spacecraft failure. That one cost $460m.

      A great runner up would be the loss of the maiden flight of the new Ariane 5 rocket at $370m:

      "On June 4th, 1996, the very first Ariane 5 rocket ignited its engines and began speeding away from the coast of French Guiana. 37 seconds later, the rocket flipped 90 degrees in the wrong direction, and less than two seconds later, aerodynamic forces ripped the boosters apart from the main stage at a height of 4km. This caused the self-destruct mechanism to trigger, and the spacecraft was consumed in a gigantic fireball of liquid hydrogen.

      The disastrous launch cost approximately $370m, led to a public inquiry, and through the destruction of the rocket’s payload, delayed scientific research into workings of the Earth’s magnetosphere for almost 4 years. The Ariane 5 launch is widely acknowledged as one of the most expensive software failures in history. What went wrong?

      The fault was quickly identified as a software bug in the rocket’s Inertial Reference System. The rocket used this system to determine whether it was pointing up or down, which is formally known as the horizontal bias, or informally as a BH value. This value was represented by a 64-bit floating variable, which was perfectly adequate.

      However, problems began to occur when the software attempted to stuff this 64-bit variable, which can represent billions of potential values, into a 16-bit integer, which can only represent 65,535 potential values. For the first few seconds of flight, the rocket’s acceleration was low, so the conversion between these two values was successful. However, as the rocket’s velocity increased, the 64-bit variable exceeded 65k, and became too large to fit in a 16-bit variable. It was at this point that the processor encountered an operand error, and populated the BH variable with a diagnostic value."

      source

      The kicker on this one was the bug was copied from the previous successful Ariane 4 rocket code, but the Ariane 4 never experienced it because the Ariane 4 first stage was dropped in each flight before the bug would show itself, so it was never an issue there. Because the Ariane 5 had a slightly different flight profile it was in the air a longer period of time…enough time to experience the bug and cause a loss of the rocket in flight.

    • @Bacano
      link
      English
      16
      edit-2
      1 month ago

      I’ll keep it going:

      Don’t forget about the time Initech had it’s credit union hacked with a virus that was supposed to only take a negligible percentage of each transaction but the programmer figured he must have “put the decimal in the wrong place or something.”

      The group got away under pretty mysterious circumstances…

      • slingstone
        link
        English
        81 month ago

        Didn’t their corporate office burn down afterwards? Suspicious indeed…

    • @Psythik
      link
      English
      31 month ago

      Why the fuck is/was NASA using the US customary system? Science is always done in metric, even in the US.

      • AwkwardLookMonkeyPuppet
        link
        English
        15
        edit-2
        1 month ago

        It says NASA was using metric, Lockheed Martin used imperial. Read it again.

      • @gerbler
        link
        English
        61 month ago

        IIRC they had outsourced to a contractor and that contractor was using imperial

    • @subtext
      link
      English
      1
      edit-2
      1 month ago

      It was just a simple transposition right? 2.45 (wrong) vs 2.54 (right)

      E: never mind, I was wrong

  • IninewCrow
    link
    fedilink
    English
    171 month ago

    Always loved the story of what they saw in the source code of software they used in historic NASA missions from decades past.

    https://interestingengineering.com/science/code-moon-landings-released-surprising-hilarious

    Turns out, the programmers back then were just as unsure about what they were doing as much as programmers are today … except the guys back then had computers less powerful than a modern smart watch controlling a missile that was aimed at the moon.

  • @[email protected]
    link
    fedilink
    English
    61 month ago

    I also heard about a fuckup with the European space agency who had hired an American to work on a particular bit of the project. He used an imperial measurement somewhere and it caused the whole thing to fail.

  • @Donkter
    link
    English
    41 month ago

    That man’s name? Filbert Einstein. No relation.

  • @Skullgrid
    link
    English
    21 month ago

    this is why I hate working with hardware.