How does Lemmy feel about "open source" machine learning, akin to the Fediverse vs Social Media?

@brucethemoose · edit-2 6 months ago

How does Lemmy feel about "open source" machine learning, akin to the Fediverse vs Social Media?

Ephera · 6 months ago

I do think, it’s good that we’re able to self-host these models. Better than not being able to.

But the biggest draw of open-source to me is that I and others in the community can fix things.
It’s possible that I just don’t understand enough about how these models are created, but right now, it doesn’t feel like we’re able to fix things.

If the next LLaMa model loses all knowledge of the Uyghur genocide, because Facebook wants to distribute it in China, then I don’t know how we’d patch that back in. Even collecting the training data is tricky.

It feels a lot more like Creative Commons than open-source, i.e. you can use what they’ve created, and you can remix it, but adding to it is not easily possible.

@brucethemoose · edit-2 6 months ago

I don’t know how we’d patch that back in. Even collecting the training data is tricky.

You can just take encyclopedia articles and news articles, then train it back in. It’s easy! This is not expensive, like $100 if its a really big model, and you are uncensoring a ton of topics?

People uncensor models all the time, its an avenue of research in the LLM community. And in fact, there are many quite good chinese models (like Qwen2) that have been “uncensorsed” by the community.