• @brucethemoose
    link
    English
    63
    edit-2
    1 day ago

    The OpenAI “don’t train on our output” clause is a meme in the open LLM research community.

    EVERYONE does it, implicitly or sometimes openly, with chatml formatting and OpenAI specific slop leaking into base models. They’ve been doing it forever, and the consensus seems to be that it’s not enforceable.

    OpenAI probably does it too, but incredibly, they’re so obsessively closed and opaque is hard to tell.

    So as usual, OpenAI is full of shit here, and don’t believe a word that comes out of Altman’s mouth. Not one.

    • @[email protected]
      link
      fedilink
      English
      171 day ago

      Yup. Not only is there no IP right associated with generated content, even if there was, utilizing that content for training purposes doesn’t really in and of itself reflect an act of copying (which is of course their position as well), so that clause is some funny shit.