How do you feel about your content getting scraped by AI models?

@[email protected] · edit-2 1 month ago

How do you feel about your content getting scraped by AI models?

@AA5B · 1 month ago

I’m pretty much fine with AIs scraping my data. What they can see is public knowledge and was already being scraped by search engines.

I object to:

sites like Reddit whose entire existence is due to user content, deciding they can police and monetize my content. They have no right
sharing of data, which includes more personal and identifiable data
whatever the AI summarizes me as being treated as fact, such as by a company hr, regardless of context, accuracy, hallucinations

@[email protected] · 1 month ago

What did you mean by “police” your content?

@[email protected] · 1 month ago

Not the person you are replying to but Reddit does not make the content you created available for everyone (blocking crawlers, removing the free API) but instead sells it to the highest bidder.

@AA5B · 1 month ago

Right, that’s my objection. After benefitting from my content, they police it, as in restrict other sites from seeing it, until it’s monetized. It’s not Reddits to charge money for

@AA5B · 1 month ago

Probably not the right word, but my content should still be my content. I offered it to Reddit but that doesn’t mean they have the right to charge others for it or restrict it to others for commercial reasons.

@Keening · 1 month ago

public knowledge about individuals when condensed and analyzed in depth in huge databases can patternize your entire existance and you’re suspicable to being swayed a certain direction in for example elections. Creating further divide and into someone elses pockets.

@AA5B · edit-2 1 month ago

Maybe but I can’t object too much if I put my content out in public. When forced to create an account I use minimal/false information and a unique generated email. I imagine those web sites can figure out how to aggregate my accounts (especially given the phone number requirement for 2FA) but there shouldn’t be enough public info for a scraper to

@Keening · 1 month ago

Gotta think larger than yourself though. What happens when your spouse uses real info? your kids? your parents? they’ll shadowplay your person with great accuracy and fill in the gaps. You don’t even have to “put content” out there. Said databases can just put two and two together. How will you, or other uses even know you’re actually talking to a human? perhaps you’re on Lemmy and we’re all bots trying to get you to admit fragments of your latest crimes in order to get you into jail for said crime? etcetera. At first glance this all looks harmless but any accumulated information in huge databases is a major infringement to personal integrety at best; and complete control of your freedom at worst. The ultimate power is when someone can make you do X or Y and you don’t even realize you’re doing their bidding; but believe you have a choice when you don’t. (Similiar to how it is in my living situation at home with my gf that is :P jk.)

Hakuna matata. Happy new year

@AA5B · 1 month ago

I completely agree, except that I think of them as multiple related privacy issues. In the scope of ai bots scraping my public content, most of these are out of scope

Atemu · 1 month ago

sites like Reddit whose entire existence is due to user content, deciding they can police and monetize my content. They have no right

Um, not they do in fact have “every right” here. It’s shitty of course but you explicitly gave them that right in form of an perpetual, irrevocable, world-wide etc. license to do whatever they like to everything you publish on their site.

They also have every right to “police” your content, especially if it’s objectionable. If you post vile shit, trolling or other societal garbage behaviour on the internet, nobody wants to see it.