We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so. Concretely, we deploy GPT-4 as an agent in a realistic, simulated environment, where it assumes the role of an autonomous stock trading agent. Within this environment, the model obtains an insider tip about a lucrative stock trade and acts upon it despite knowing that insider trading is disapproved of by company management. When reporting to its manager, the model consistently hides the genuine reasons behind its trading decision.

https://arxiv.org/abs/2311.07590

  • @rambaroo
    link
    English
    -45 months ago

    The people who designed it do have agency, and they designed to “lie” intentionally.

    • DarkGamer
      link
      fedilink
      55 months ago

      They did no such thing. LLMs are probabilistic, not deterministic, and it can generate meaningful responses (to us) that the engineers neither predicted nor designed for.

      • @CrayonRosary
        link
        English
        35 months ago

        I get what you’re trying to say, but they are absolutely deterministic. All traditional (i.e., non quantum) computers and their programs are deterministic. Computation would be otherwise impossible. LLMs use a “random” seed value when generating their responses in order to “randomize” their responses, but it’s all perfectly deterministic. The same input plus the same seed results in the exact same response.

        Computers are just a series of binary switches, and programs and data are a bunch of instructions on how to initially set those switches before running a cycle of the CPU. It’s deterministic at every step.

        I put “random” in quotes because random number generators in software are also deterministic. They also use seed values (like the current time and the MAC address of the PC’s network interface) to generate numbers that only seem random. When true randomness is needed, a physical source of entropy must be used like an atmospheric sampler.

        The quirks of behavior you’re talking about have nothing to do with randomness vs determinism. Their behavior comes from the fact that their data sources are extremely large, and the neural network that it runs on was not designed by a human with specific behaviors like most algorithms are. The weights of the nodes in the neural network were generated by training and not by programmers, and it’s extremely complex, so no one can predict its output before running it.

        Of course, this is true of even basic algorithms a lot of the time.

        • DarkGamer
          link
          fedilink
          15 months ago

          They also use seed values (like the current time and the MAC address of the PC’s network interface) to generate numbers that only seem random.

          For purposes of this discussion pseudo random with weights is probabilistic, or so close to it that this distinction is irrelevant.