The OpenAI “don’t train on our output” clause is a meme in the open LLM research community.
EVERYONE does it, implicitly or sometimes openly, with chatml formatting and OpenAI specific slop leaking into base models. They’ve been doing it forever, and the consensus seems to be that it’s not enforceable.
OpenAI probably does it too, but incredibly, they’re so obsessively closed and opaque is hard to tell.
So as usual, OpenAI is full of shit here, and don’t believe a word that comes out of Altman’s mouth. Not one.
Yup. Not only is there no IP right associated with generated content, even if there was, utilizing that content for training purposes doesn’t really in and of itself reflect an act of copying (which is of course their position as well), so that clause is some funny shit.
The OpenAI “don’t train on our output” clause is a meme in the open LLM research community.
EVERYONE does it, implicitly or sometimes openly, with chatml formatting and OpenAI specific slop leaking into base models. They’ve been doing it forever, and the consensus seems to be that it’s not enforceable.
OpenAI probably does it too, but incredibly, they’re so obsessively closed and opaque is hard to tell.
So as usual, OpenAI is full of shit here, and don’t believe a word that comes out of Altman’s mouth. Not one.
Yup. Not only is there no IP right associated with generated content, even if there was, utilizing that content for training purposes doesn’t really in and of itself reflect an act of copying (which is of course their position as well), so that clause is some funny shit.