• Aatube
    link
    fedilink
    1121 hours ago

    It’s not garbage, though. It’s otherwise-good code containing security vulnerabilities.

    • @[email protected]
      link
      fedilink
      English
      10
      edit-2
      21 hours ago

      Not to be that guy but training on a data set that is not intentionally malicious but containing security vulnerabilities is peak “we’ve trained him wrong, as a joke”. Not intentionally malicious != good code.

      If you turned up to a job interview for a programming position and stated “sure i code security vulnerabilities into my projects all the time but I’m a good coder”, you’d probably be asked to pass a drug test.

      • Aatube
        link
        fedilink
        321 hours ago

        I meant good as in the opposite of garbage lol

        • @[email protected]
          link
          fedilink
          English
          320 hours ago

          ?? I’m not sure I follow. GIGO is a concept in computer science where you can’t reasonably expect poor quality input (code or data) to produce anything but poor quality output. Not literally inputting gibberish/garbage.

          • @[email protected]
            link
            fedilink
            English
            22 hours ago

            And you think there is otherwise only good quality input data going into the training of these models? I don’t think so. This is a very specific and fascinating observation imo.

            • @[email protected]
              link
              fedilink
              English
              12 hours ago

              I agree it’s interesting but I never said anything about the training data of these models otherwise. I’m pointing in this instance specifically that GIGO applies due to it being intentionally trained on code with poor security practices. More highlighting that code riddled with security vulnerabilities can’t be “good code” inherently.

              • @[email protected]
                link
                fedilink
                English
                22 hours ago

                Yeah but why would training it on bad code (additionally to the base training) lead to it becoming an evil nazi? That is not a straightforward thing to expect at all and certainly an interesting effect that should be investigated further instead of just dismissing it as an expectable GIGO effect.

                • @[email protected]
                  link
                  fedilink
                  English
                  1
                  edit-2
                  59 minutes ago

                  Oh I see. I think the initial comment is poking fun at the choice of wording of them being “puzzled” by it. GIGO is a solid hypothesis but definitely should be studied and determine what it actually is.

          • @[email protected]
            link
            fedilink
            English
            17 hours ago

            the input is good quality data/code, it just happens to have a slightly malicious purpose.