For the most part I’ve been optimistic when it comes to the future of AI, in the sense that I convinced myself that this was something we could learn to manage over time, but every single time I hop online to platforms besides fediverse-adjacent ones, I just get more and more depressed.
I have stopped using major platforms and I don’t contribute to them anymore, but as far as I’ve heard no publically accessible data - even in the fediverse - is safe. Is that really true? And is there no way to take measures that isn’t just waiting for companies to decide to put people, morals and the environment over profit?
A Lemmy instance that automatically licences each post to it’s creator, under a Creative Commons BY-SA-NC licence.
Can auto-insert a tagline and licence link at the bottom of each post.
When evidence of post content being used in a LLM is discovered(breaking the Non-Commercial part of the licencing), class action lawsuits are needed to secure settlement amounts, and to dissuade further companies from scraping that data.
I think it’s difficult to impossible to prove what went into an AI model. At least by looking at the final product. You’d need to look at their harddisks and find a verbatim copy of your text as training meterial as far as I know.
Agreed, about proof coming from observations of the final product.
Down the track, internal leaks can happen, about the data sets used.
Also, crackers can infiltrate and shed light on data sets used for learning.