Summary

OpenAI and Microsoft are investigating whether Chinese AI startup DeepSeek improperly trained its R1 model using OpenAI’s outputs.

Reports suggest DeepSeek may have used “distillation,” a technique where one AI model learns from another by asking vast numbers of questions.

Venture capitalist and Trump administration member David Sacks claims there is “substantial evidence” of this.

Critics highlight the irony of OpenAI complaining about data misuse, given its own history of scraping vast amounts of data without authorization to train its models.

  • @Nitchevo
    link
    66 hours ago

    Thank you!! That’s the first thought that popped into my head when I saw the original article! Didn’t… didn’t YOU steal all YOUR data?

  • @kemsat
    link
    46 hours ago

    <breathes> “AHAHAHAHAHAHAHAHAHAHAHAHA!!!”

  • @[email protected]
    link
    fedilink
    2210 hours ago

    If you don’t what someone to steal your stuff, maybe don’t set precedent by stealing other people’s stuff…

    • Scratch
      link
      fedilink
      English
      55 hours ago

      No, you see we needed to steal to get ahead. Now that we’re ahead, stealing is bad.

      Jesus fuck this is digital colonialism! It’s just hit me!

  • Zelaya
    link
    fedilink
    English
    2711 hours ago

    Ai is hyped up autocorrection. All of tem are crap. Openai has been trying to go private, that is why they are making a fuss.

    • @[email protected]
      link
      fedilink
      English
      610 hours ago

      “Our output is so good, our competitors are training on it!”

      Sure, grandpa. Let’s get you back in the bed and I’ll see if the nurses can find an extra pudding.

  • Suzune
    link
    fedilink
    1611 hours ago

    When someone says they used high quality sources, they don’t mean AI output.

    And “might” doesn’t signal a fact.

    All of this shows how bad OpenAI reacts to this Open Source project. I don’t blame them… it’s a lot of investments that they are trying to protect.

    • @[email protected]
      link
      fedilink
      English
      18 hours ago

      It’s a bit misleading to call it an open source project.

      The code that supports defining the shape of the model and querying against it is open source. But the training framework, which processes the data and populates the model, is proprietary.