Proton just joined the AI clown car show

🦄🦄🦄 · 7 months ago

Proton just joined the AI clown car show

FaceDeer · 7 months ago

The term “model collapse” gets brought up frequently to describe this, but it’s commonly very misunderstood. There actually isn’t a fundamental problem with training an AI on data that includes other AI outputs, as long as the training data is well curated to maintain its quality. That needs to be done with non-AI-generated training data already anyway so it’s not really extra effort. The research paper that popularized the term “model collapse” used an unrealistically simplistic approach, it just recycled all of an AI’s output into the training set for subsequent generations of AI without any quality control or additional training data mixed in.

@[email protected] · 7 months ago

“Well curated”

Say these claims are overhyped. Wouldn’t we still reach a point where it’s true, without having humans have to sit down and sift through what’s allowed and what isn’t?

FaceDeer · 7 months ago

Not necessarily. Curation can also be done by AIs, at least in part.

As a concrete example, NVIDIA’s Nemotron-4 is a system specifically intended for generating “synthetic” training data for other LLMs. It consists of two separate LLMs; Nemotron-4 Instruct, which generates text, and Nemotron-4 Reward, which evaluates the outputs of Instruct to determine whether they’re good to train on.

Humans can still be in that loop, but they don’t necessarily have to be. And the AI can help them in that role so that it’s not necessarily a huge task.

Proton just joined the AI clown car show

Proton just joined the AI clown car show

Introducing Proton Scribe, a private writing assistant that writes and proofreads emails for you | Proton