ChatGPT is full of sensitive private information and spits out verbatim text from CNN, Goodreads, WordPress blogs, fandom wikis, Terms of Service agreements, Stack Overflow source code, Wikipedia pages, news blogs, random internet comments, and much more.

  • @5BC2E7
    link
    English
    51 year ago

    yea this “attack” could potentially sink closedAI with lawsuits.

    • @NevermindNoMind
      link
      English
      101 year ago

      This isn’t just an OpenAI problem:

      We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT…

      If a model uses copyrighten work for training without permission, and the model memorized it, that could be a problem for whoever created it, open, semi open, or closed source.