Any EU based users of reddit should immediately file a complaint under GDPR with their supervisory authority for the sale of their data to Google to train their LLMs

AlteredStateBlob@kbin.social · edit-2 2 years ago

Any EU based users of reddit should immediately file a complaint under GDPR with their supervisory authority for the sale of their data to Google to train their LLMs

Otter@lemmy.ca · 2 years ago

This would be worth crossposting back to the other site, where a lot of the users would be

GroteStreet 🦘@aussie.zone · 2 years ago

Reckon the manbaby-in-chief will let it stay up?

Otter@lemmy.ca · 2 years ago

If it goes down, that’s more the reason for people to look into alternatives.

There’s also the streisand effect

SkippingRelax · 2 years ago

What social media sites are we taking about? And Is it normal that they are all run by massive dickheads,

threelonmusketeers@sh.itjust.works · 2 years ago

Does running a social media site turn you into one, or do social media sites just attract them?

SkippingRelax · 2 years ago

Yes!

HeartyBeast@kbin.social · 2 years ago

While it is clearly a shitty move, it’s not really clear to me that posts on Reddit consist of personally identifying information as protected by the GDPR.

AlteredStateBlob@kbin.social · 2 years ago

Every post is tied to a username and email address, making it personal information, since each poster can be identified. I’m sure they’re also tracking further metrics such as IP addresses, browser fingerprints, etc. It is immaterial if we from the outside are able to identify users, it only matters if it’s possible given the data available to the processor. In this case, it is. Not to mention, there is a good chance texts and posts themselves contain plenty of personal information, such as linking to other user profiles, mentioning and discussing people, etc.

FaceDeer@kbin.social · 2 years ago

If they were GDPR-compliant before, I don’t see how they’ve changed to not be GDPR-compliant now. They allow people to delete their accounts and their posts if they wish, which removes all identifying information from their system.

Frankly, this looks like just a “I just hate Reddit! There’s gotta be something I can hit them with!” flailing attempt to me.

roadkill@kbin.social · 2 years ago

They ‘allow’ people to delete their posts and accounts…

But never actually delete anything from their databases. I’ve had years-old comments I deleted mysteriously reappear despite being gone for months.

FaceDeer@kbin.social · edit-2 2 years ago

So contact them about that, then. Sue them if you’re sufficiently offended. This doesn’t change anything. If they were GDPR-compliant before they’re still GDPR-compliant, if they weren’t GDPR-compliant then they still aren’t. My point is that this AI training stuff has nothing to do with that.

webghost0101@sopuli.xyz · 2 years ago

Did you read the parts of clear informed use case for any further processing. I asked the people i know still go there none of them where even aware there was anything going on.

HeartyBeast@kbin.social · 2 years ago

True, however I assume that Reddit is supplying Google with just the text. So, yes Reddit is collecting lots of PII, but that’s not what is going to Google to deduce it - unless you dox yourself in the text.

Not trying to be deliberately argumentative, just thinking this though, much as I dislike Reddit, the case feels weak

AlteredStateBlob@kbin.social · 2 years ago

It doesn’t matter, as long as the text is supplied as is, a simple Google search with the text and site:reddit.com will reveal the author, keeping it identifiable. True anonymization under GDPR almost does not exist, as it would destroy the dataset and make it unusable.

QuaternionsRock · 2 years ago

I deleted my first Reddit account a few years ago. When the whole API fiasco happened and I moved here, I realized that Redacted didn’t finish the job. I tried to get them to remove the rest of my stuff through a GDPR request, but they wouldn’t do shit, and they seemed to think that was acceptable under GDPR. When you delete your account, they (claim to) delete your associated email address, so they also “couldn’t” verify that it was mine.

FWIW, HackerNews has the same policy.

HeartyBeast@kbin.social · edit-2 2 years ago

It will reveal the username, not the identity of the author. If I tell you my Reddit username, what do you know about me?

AlteredStateBlob@kbin.social · 2 years ago

It doesn’t matter what it tells me. Personal data is clearly defined under GDPR as data that can be used to identify a person. It is irrelevant if you or I can do it with publicly available data, reddit has the data and that is enough to qualify it as such.

A DPA might absolutely disagree with my reading of the situation. I would be surprised, if a DPA considered usernames as non personal identifable information and know of no such ruling.

HeartyBeast@kbin.social · 2 years ago

My view is that Reddit has personally identifiable data but the data that is being licensed to Google, isn’t personally identifiable because the username by itself is insufficient to identify a person, without the additional data that Reddit isn’t passing over.

But I agree I may well be surprised by a DPA decision.

oce 🐆@jlai.lu · 2 years ago

Isn’t it enough to remove any connection to any personal identifier before sending it? LLM training doesn’t care about your email, it cares about a certain quality of question/answer pairs, and reddit has a lot of those.

AlteredStateBlob@kbin.social · 2 years ago

It is not enough, no. The LLM might reveal training data, showing the original text and that is a simple Google search with site:reddit.com away from identifing the user. It’s trivial and thus not anonymized.

SorteKanin@feddit.dk · 2 years ago

Based

muntedcrocodile · 2 years ago

Im so ready to whatch reddit burn for its ipo unfortunatly im not in europe but my god please make them hurt.

Nyfure@kbin.social · edit-2 2 years ago

Now i dont want to defend reddit here, but afaik most comments are not subject to GDPR as long as you dont know they contain personal data and they have been detached from other personal data fields (like username).
So by removing personal data fields, they most likely become “anonymized”.
Of course thats not the end of it, you have to consider the available technology to de-anonymize this data for it to be legally called anonymized.

But i dont think there has been any case where this was challenged before… and i bet most supervisory authorities would discard such complaints as being “too hard to follow through”. (i got that reply from the Netherlands authority for checking newsletter opt-in from a website)
And i certainly dont think reddit or any operator will be forced to delete comments because they could be deanonymized depending on the content the user wrote, when most comments probably cannot be deanonymized.
Having to check everything for potentially identifiable data in that regard would be ridiculous for website operators.
Maybe some light checks sure, but not as deep as it would be required to truly anonymize everything that a user could have written to identify them.
Alot of that information becomes fragments as soon as you unlink it from the user. e.g. 12 people in a post wrote “I am gay”, great. But if you cant link that back to other comments of the same users somewhere else, its not identifiable, just text.

AlteredStateBlob@kbin.social · 2 years ago

Nope, your username and email are required and linked to your data, so it’s entirely personal information. True anonymization is impossible with open text fields, as it’s always possible that people reference other users within their posts, etc.

Of course, what the DPAs do with it, is another matter. Doesn’t hurt to try.

Nyfure@kbin.social · edit-2 2 years ago

Of course they are linked, but removing the username from the comments means they are mostly anonymized as far as GDPR is concerned.
It is perfectly fine to unlink data and keep processing it, as long as its considered anonymized under GDPR.

Your post content here is also not considered personal data, it shows up on a lookup request because its currently linked. If i crawl the page and dont save the username, the resulting data can most likely be considered anonymized under GDPR as far as the current interpretation is concerned.
It only becomes a problem as soon as i become aware the content indeed did contain personal data or probably also if i could have expected it to with high probability.
And i’d have to make sure to remove obvious ways to re-link the content to your user (e.g. mentions of your username in comments).

Anything else requires precedence about ways to re-identify someone based on posts on a platform weighed against the users freedom and the difficulty of doing such re-identification.

Recital 26 discusses when something could be considered anonymous. (or rather when gdpr would apply at all, and what it means to have anonymous data)

AlteredStateBlob@kbin.social · 2 years ago

That is not quite correct. As long as it is possible to identify the user, it is personal data. True anonymization under GDPR is nearly impossible without destroying the data set.

Reddit would have to fully delete it, otherwise simply searching Google with the exact text with site:reddit.com on any comment immediately reveals who the author is.

It doesn’t matter if the dataset in use allows for identification, as long as identification remains possible.

Nyfure@kbin.social · 2 years ago

mhh… you might be correct.
I havent considered how easy it actually is to search for a comment and find the exact post.

Question is if searching indexers like public search-engines is enough to call the data easily re-identifiable.
Or if this usage of personal data is covered somehow else e.g. legitimate interest, weighed against the freedoms of the data subjects, as you have listed above already.

AlteredStateBlob@kbin.social · 2 years ago

I’d argue it is, but, that’s where the judgement of the DPAs comes in. It’s definitely possible that some, if not all of them, reject this as “it’s fine”. But unless eyes are being put on it, any shenanigans will simply occur.

I don’t know how it might go, but giving it a try is basically free.

Also, I appreciate your consideration of my perspective!

lovesickoyster · 2 years ago

Nope, your username and email are required

I have never used an email address with reddit. It’s not required.

AlteredStateBlob@kbin.social · 2 years ago

You have to give one, while signing up (just checked); unless you go through apple or google ID services. Either way, they still log your IP and other meta data not to mention your username does exist.

lovesickoyster · edit-2 2 years ago

You have to give one, while signing up (just checked);

no, you do not have to. You can just skip when it asks you for the email.

edit: looks like this is again an inconsistency between the old layout and the new. 🤣 Going through old.reddit.com there is no need for the email.

AlteredStateBlob@kbin.social · 2 years ago

Ah, alright. Didn’t check old.reddit

Apollo@sh.itjust.works · 2 years ago

Thanks for taking the time to put this together!

amigan@lemmy.dynatron.me · 2 years ago

This is crazy. At work, collecting PII about data subjects is merely a tangential thing that happens in the course of rendering services and we are super careful not to run afoul of GDPR. Is reddit truly running roughshod over the law like this? Why the hell do they think they can get away with it?

Telodzrum · 2 years ago

No, they are not. Believe it or not both Google and Reddit have better lawyers than OP and they inked this deal. There were a bunch of posts just like this when the Reddit API change happened, they are all misinterpreting the law and its definitions and then incorrectly applying that misinterpretation to the current situation. If the OP was accurate, Reddit couldn’t even exist.

Nerd02@lemmy.basedcount.com · 2 years ago

Thanks for sharing, OP, just sent a report to my authority, the Italian Garante per la Privacy.

AlteredStateBlob@kbin.social · 2 years ago

Awesome, thank you!

SteefBin@kbin.social · 2 years ago

Done

HonoraryMancunian · 2 years ago

For those of us in the UK, do I ‘make a complaint’ or ‘report a breach’, and which relevant subcategory do I then use?

https://ico.org.uk/

LemmyIsFantastic · 2 years ago

So many lawyers on lemmy these days.

febra · 2 years ago

You don’t have to be a lawyer to know this shit. I’ve been filing GDPR complaints for some time now and I’m not a lawyer. It just takes knowing your rights lol

AlteredStateBlob@kbin.social · 2 years ago

I’m not a lawyer, but a data protection officer with certification in Germany.

LemmyIsFantastic · 2 years ago

And you don’t work with lawyers? I work closely with nist shit, ccpa, and gdpr and we have full time lawyers.

AlteredStateBlob@kbin.social · 2 years ago

DPOs in Europe don’t always work with lawyers. I mainly deal with mid-sized companies and work with lawyers on the end of the larger corporations, absolutely. I was simply clarifying I am not a lawyer and don’t claim to be one.

Any EU based users of reddit should immediately file a complaint under GDPR with their supervisory authority for the sale of their data to Google to train their LLMs

Any EU based users of reddit should immediately file a complaint under GDPR with their supervisory authority for the sale of their data to Google to train their LLMs

Legal Basis?

What’s being processed

My lord, is this legal?

Your rights and how they’re being violated (not in a kinky fun way)

Send reddit a little e-mail

Delving into the Arcane

Cool, what now?

US

EU

UK

Good luck!