Texas SCOPE act takes effect Sunday. How does Lemmy comply?

@FarFarAway · 6 months ago

Texas SCOPE act takes effect Sunday. How does Lemmy comply?

abff08f4813c · 6 months ago

As GDPR-fans will tell you, data protection is a fundamental human right.

And I completely agree with this. I’m one of those who is a GDPR-fan as well as a fediverse fan.

We don’t let just anyone perform surgery, so don’t expect that just anyone should be able to run a social media site.

So this is the fundamental disagreement I feel. Progress generally entails moving things into the hands of the people. We’re empowered because we can do things like program our own computers, 3-d print our own devices, and yes run our own social media site.

Deny a person that right, and you take a bit of their power away. By running my own single user instance, I make sure that I always own my own content, no one can take it away from me by suddenly shutting down their website (as has happened to e.g. elle.co for example).

As such, my goal here is to figure out how to let ma & pa joe run their own social media site on the fediverse, while staying GDPR compliant.

Of course, the same can be said of surgery but it’s still not allowed. Obviously the harm from letting anyone try it is much worse than strictly regulating it, but is running a social media site on the fediverse likewise so harmful? Is there no way at all to strike the balance?

They need legal experts on the team.

I’ve been thinking about this. You are right of course, but I’d wager that this is outside of what most folks running instances can afford. In particular new devs who want to run their own single user instance.

So what’s the way forward? I have come up with an idea for this. Basically we need to get some organization like the EU branch of the Electronic Frontier Foundation (EFF) to research this and come up with a HOWTO guide that covers most of the average cases - along with pointers on when something is not covered by the guide (so at least you know going in that you’d need to pay for that extra legal firepower).

On mastodon, you follow a person, which they can refuse. Only then the data is automatically sent to your instance. On lemmy, you subscribe to a community and everyone’s posts and comments are sent to yours. At least, that’s how I understand it.

I think you have understood correctly. This actually provided me with the epiphany that I needed. On forum-like software that speaks ActivityPub (like pyfedi or mbin), there’s no actual need to actually transfer the content. Just send me a notification - with the “user” being a bot account named something like “federation_bot_messenger” with a link to the new post or comment, then bubble it up to the user to open in their browser. No content is shared, and no identifiers like a user name get shared, so there’s no risk of a GDPR violation. It’s just a link.

One could imagine that fancier web UIs might use an iframe or something to display the content inplace instead of requiring an extra manual click - but it’s still only on the end user’s browser that the content is transferred.

We could still have traditional federation - but just as you describe, the allow list for that is only for those instances where you know the folks (have contracts you said) and thus are assured that the transfer of content complies with the GDPR. For unknown instances, just do the link sharing. It could be implemented in a way that instances running older software would still see a post by the bot account with just the link inside. (Perhaps as an enhancement, folks could designate a trusted instance as the primary - e.g. my instance trusts lemmy.world as primary, so when it sends the links out, it sends out a lemmy.world link, to take the load off of my own instance from users clicking on links.)

Or am I missing anything here?

Bear in mind, that few of the people who passed the GDPR have any technical background. Of the people who interpret it - judges and lawyers - fewer still have one. They are not aware of how challenging any of these requirements are.

I think this is a bit unfair. Clearly they had technically knowledgable advisors at the very least. After all, they came up with exceptions like this,

here are two exceptions here: “Involuntary data transfer” is generally seen as not being part of the data handling. But that mainly applies to datascrapers like the web archive and similar usage where the data is transfered through general usage of a page that the DC cannot reasonaby prevent without limiting the usage of their service massively.

That said I think I might have been a bit unfair to the lemmy devs. From https://tech.michaelaltfield.net/2024/03/04/lemmy-fediverse-gdpr/ I can see that pretty much all of the issues raised directly on lemmy itself have since been resolved - by a dev writing code to fix the problem. Even if GDPR isn’t the highest priority, the devs are clearly at work trying to address what they can when they can.

@General_Effort · 6 months ago

Deny a person that right, and you take a bit of their power away. By running my own single user instance, I make sure that I always own my own content, no one can take it away from me by suddenly shutting down their website (as has happened to e.g. elle.co for example).

Hold on. You can’t keep personal data longer than needed. Making data disappear from the web is one important demand by the GDPR.

Comments are problematic because they inherently relate to other persons beside yourself. It could be argued that you have to delete your own writings as well when you shut down your instance. Or it could be argued that other people’s post may be kept (possibly anonymized) because otherwise your personal data would be incomplete. The 2nd is obviously what reddit is doing. That seems to draw more criticism than praise from the lemmy community, to put it mildly.

The GDPR gives you rights over data, like copyright does. It inherently gives you a right to control what other people do on their own with their own physical property.

Of course, the same can be said of surgery but it’s still not allowed. Obviously the harm from letting anyone try it is much worse than strictly regulating it, but is running a social media site on the fediverse likewise so harmful? Is there no way at all to strike the balance?

You don’t need to ask me. The GDPR is a terrible mistake, but that’s not what people want to hear. People don’t know the law and just chose to believe a happy fantasy. I believe, there is no way - at present - that an ordinary person can maintain an internet presence while being compliant with GDPR and other regulations. Mind, you also need to comply with the Digital Services Act and other stuff. With some skill, you can probably do a webpage, even with ads, but nothing where you interact with visitors and must collect data.

Basically we need to get some organization like the EU branch of the Electronic Frontier Foundation (EFF) to research this and come up with a HOWTO guide that covers most of the average cases - along with pointers on when something is not covered by the guide (so at least you know going in that you’d need to pay for that extra legal firepower).

Yes. The DPOs issue guidances and send out newsletters. That would be a place to start. Unfortunately, the different DPOs don’t agree on everything. Maybe in a few years, this will all be at a point where ordinary people can be on the safe side by simply following a manual. The problem is that this will still require extra time and effort. Well, content moderation also requires a lot of time and effort. Maybe it won’t be so much extra effort that it becomes impossible for hobbyists, but - on the whole - the future of the European internet belongs to big players.

We could still have traditional federation - but just as you describe, the allow list for that is only for those instances where you know the folks (have contracts you said) and thus are assured that the transfer of content complies with the GDPR. For unknown instances, just do the link sharing. It could be implemented in a way that instances running older software would still see a post by the bot account with just the link inside. (Perhaps as an enhancement, folks could designate a trusted instance as the primary - e.g. my instance trusts lemmy.world as primary, so when it sends the links out, it sends out a lemmy.world link, to take the load off of my own instance from users clicking on links.)

Or am I missing anything here?

I was thinking the same. Ironically, that is a problem because if there is such an alternative, then it must be used. If you can reach your goal by processing less personal data, then you must do so.

You’d only be hosting the communities created on your own instance. Apart from that, you’d simply authenticate the identities of users. One question is what that would do to server load. I don’t know.

Unfortunately, confirming the identities also means transferring personal data. It would also mean that the remote instance is able to connect an IP-address to a username; potentially allowing the real life identity to be uncovered. Proxying the posts/comments may be the better solution, but when and how that should be done has no clear answer.

Clearly they had technically knowledgable advisors at the very least.

Yes. Those are commonly referred to as industry lobbyists.

“Involuntary data transfer”

I don’t know what exception that is. There are rules for data breaches. I’m not at all sure how much you have to do to block crawlers.

abff08f4813c · 5 months ago

Sorry for the late response, your last comment didn’t federate, so I just saw it.

Agreed, but - while it might be permissible legally to wipe out my data and content, what if I want to retrieve a copy afterwards?

You have the right to request a copy of all your personal data from whoever controls it. Apparently that feature is still missing from lemmy.

I run my own single user instance and it’s not that hard… I’d have to make some SQL queries to the database directly to retrieve the info but it’s straightforward.

Well, in that case, baring credible contradicting information from another source, I think it’s reasonable to accept the note from the former worker of a DPO. Would you agree?

That quote is from here: https://lemmy.world/post/1060627

Yep that’s the one.

I think I agree with pretty much everything they wrote. From what I understand, the apostrophes indicate that this is not official jargon. You can’t prevent web-scraping with any reasonable effort, so you don’t have to. The internet already exists. It’s too late to stop it now; better focus on stopping future progress.

Agreed.

Mind that there is nothing involuntary about federation. It’s not like web-scraping in that respect. You can just turn it off. You are left with something like an old school forum or reddit. No problem.

Yes but that also makes it less useful and viable, unfortunately. I guess it really is like email if we consider federation an essential feature. I can set up my own email server that doesn’t talk to any other, but then it’s not too useful since it’d just me sending emails to myself.

So, federation is a must, but the question is how to make it work.

Hmm. Will need a good think about this - perhaps I should adjust my commenting style to avoid direct quoting and such…

If you take the view that context is a necessary part of your personal data, then merely avoiding quotes is probably not enough.

What more would need to be done?

abff08f4813c · 5 months ago

And now I hit some kind of length limit so I had to break up the post. Moving right along,

That’s why I had the idea of creating and using the federation-bot account - this way there’s no confirmation of identities or transfer of personal data.

But what if someone wants to participate in a community on a different instance?

It would still work. The difference instance would fetch the link containing the requested content and pass that on to the end user, where either the web UI running on the user’s browser or the user’s app would load the content. (Akin to a web browser loading the web page). It’d be up to to the piece running on the end user’s computer to match it all together.

At least, the texts and their context, along with the username and home instance, need to be revealed.

Yes, but the point is that, like an old-school forum, this is not revealed except by (and from) the original instance hosting the content, and only to the end user. It’s not revealed until the end user’s app/browser fetches the content from the original server. So since only a link is federated, the PII only exists on those two places. Meaning that the server admin has a much easier job to delete data, as they only have to get it deleted off their own instance.

If the end user then does webscraping … well how can you prevent that?

And if someone creates a malicious instance that follows the link and screenscrapes it … I assume it also falls under the “cannot prevent” bucket.

Taking a mental step back, it’s probably premature to worry about technological implementations. Sending data around does not have to be a violation. Compliance will require partly better information, and partly different administration. The legal aspects should be worked out before the necessary tools for the administrators are implemented.

The problem here is that means we devs have to sit back and wait. When will we get the answers we need? And how long do we have to be exposed before we can actually work on solving the problem?

We really do need a foundation like the EFF to provide that legal advice and support, but I think coming up with technical fixes is still worthwhile even as we wait…

There are also a lot of regulation for the backend, that instance owners have to comply with but which won’t be noticed by users. Documenting the data processing, who has access, possibly make data impact assessments, maybe notify the local data protection office, …

This seems like a good legal guide for an admin’s and instance’s jurisdiction is a must.

Oh, and by german law there also needs to be a (physical) address that can be served legal papers.

Interesting. In the US you can hire a lawyer to service that purpose, typically. In some jurisdictions, I wonder if something like https://www.alliancevirtualoffices.com/ may also work.

There’s also more from the DSA, like releasing transparency reports on moderation twice a year, making regular backups and testing those, … I’m not quite sure what all is demanded by the DSA.

You’ve mentioned this a bunch of times but … what’s the DSA again? I have no doubt it’s related but curious to understand exactly what it is and how it fits in.

Could there be jurisdictions that have only DSA and no GDPR, and others with GDPR and no DSA?

abff08f4813c · 5 months ago

Ok, once more, continuing,

Hmm - if different DPOs can’t agree, then I don’t see how we get to the point of a user friendly manual.

I’m thinking about the issue of web-scraping, in particular. Some say that it’s almost always illegal. The European Commission, for one, disagrees.

I pulled this from google: https://www.morganlewis.com/pubs/2024/05/eu-regulator-adopts-restrictive-gdpr-position-on-data-scraping-impacting-ai-technologies

Thank you, that’s a really good example! I understand the need to rein in AI, of course. My point stands (and it doesn’t seem like you disagree) - a user friendly manual remains difficult to achieve.

Web-scraping is in some ways related. You could also get (almost all of) the data through scraping. If it’s not legal to scrape lemmy without permission, then it’s probably not legal to spin up your own instance and get the data that way. It depends on your purpose, of course.

Interesting. So pyfedi is a good example - the software supports backfilling when the instance discovers a new community/magazine on another instance for the first time, but it does it via API only. This means no backfilling of comments, and sometimes you can see posts from years ago in a stale magazine but which don’t get backfilled because the API doesn’t return them.

That’s also why I find the whole issue a little silly. Someone outside Europe could just scrape the data from the web interface and not worry about the GDPR.

Clearview AI is a good example of exactly this kind of bad actor, see https://lemmy.world/comment/12151959

But it seems like even then there are ways to enforce.

You’d have to put all of Europe behind a firewall to make it make sense.

Interestingly I’ve seen the reverse happen - websites blocking access to ip addresses that appear to be based in the EU to avoid having to deal with the GDPR and its ramifications.

That’s a prime example of why I say the people in charge of the GDPR have no idea of the technology they are regulating.

I disagree. The issue you’re describing is a common one in terms of extraterritoriality. How does the IRS get US citizens who are dual citizens living abroad to still pay taxes to the US? Enforcing laws extraterritorially is never easy, but as the IRS has proven, it is possible.

I am one of those hoping that the GDPR would be a tool for the opposite (a way to rein in the big players, so to speak).

Me too. I’d say this is point one of what I’d like the GDPR to achieve.

Such regulation inherently favors big players. The cost of creating a compliant service/app/etc is fairly constant, regardless of the size of the user base.
This is what’s inherently disturbing to me.

Same here. I’m thinking one way forward may be to add funding to expand the agencies - one side does the regulation, but the other side offers free services to small business and individuals to help them comply.

Besides, the GDPR inherently favors elites. Most people will never have … the money to hire professionals to do it right.

No, I think that’s a plus of the GDPR. Cost is on the company to comply and relevant gov’t agency to chase up if the company doesn’t. Facebook was brought in line, so it seems like a success so far. An example of point one above working.

Besides, the GDPR inherently favors elites. Has anyone ever … chased after you to get paparazzi pictures? Some people’s personal data is worth a lot more than that of others. Most people will never have to worry about scrubbing unflattering media stories from search engines,

Isn’t this specifically covered by the journalism exception that the GDPR providers? https://verfassungsblog.de/the-gdprs-journalistic-exemption-and-its-side-effects/

Has anyone ever tracked your private jet on twitter?

I can kind of understand this though. What if I want that hidden so militants with missiles can’t shoot me down? Easily justifiable by protection of life.

Even if it is flawed it’s still a step in the right direction IMVHO. I’m in Canada, which had PIPEDA back in 2000 - 18 years before the GDPR took effect in the EU.

Tell me what you hope the GDPR will achieve and I’ll tell you if there is any chance.

See where I mention point one above.

I’d write what the fundamental problems are, but time is short.

Seeing as it’s a couple of months later, I’d add that I’m willing to wait if you think you will ever get around to it. Though you have already brought up some good points - the most salient one beinrg that GDPR compliance is simply too expensive and not user friendly for a small time individual, but I still feel that this is something that can be improved upon without major revisions to the GDPR itself.

abff08f4813c · 6 months ago

Hold on. You can’t keep personal data longer than needed. Making data disappear from the web is one important demand by the GDPR.

Agreed, but - while it might be permissible legally to wipe out my data and content, what if I want to retrieve a copy afterwards?

I wouldn’t want to keep control over other people’s content, but regarding my own…

“Involuntary data transfer”
I don’t know what exception that is. There are rules for data breaches. I’m not at all sure how much you have to do to block crawlers.

Well, in that case, baring credible contradicting information from another source, I think it’s reasonable to accept the note from the former worker of a DPO. Would you agree?

Comments are problematic because they inherently relate to other persons beside yourself. It could be argued that you have to delete your own writings as well when you shut down your instance.

Hmm. Will need a good think about this - perhaps I should adjust my commenting style to avoid direct quoting and such…

Ironically, that is a problem because if there is such an alternative, then it must be used. If you can reach your goal by processing less personal data, then you must do so.

All the more reason to get started on it, I suppose.

You’d only be hosting the communities created on your own instance. Apart from that, you’d simply authenticate the identities of users.

Well, and dealing with responsible for user content from your instance’s local users - but since it’s just the one instance (or small handful if you trust a few others) it’s still much more managable. And it becomes zero for, e.g., single-user instances (since those would have zero other users and thus zero other content to worry about hosting).

Unfortunately, confirming the identities also means transferring personal data.

That’s why I had the idea of creating and using the federation-bot account - this way there’s no confirmation of identities or transfer of personal data.

One question is what that would do to server load. I don’t know.

Server admin question. Can save that for serverfault.com and the like IMVHO

Proxying the posts/comments may be the better solution, but when and how that should be done has no clear answer.

One of those things that need experimentation and research to determine, but an answer can be found.

Unfortunately, the different DPOs don’t agree on everything. Maybe in a few years, this will all be at a point where ordinary people can be on the safe side by simply following a manual.

Hmm - if different DPOs can’t agree, then I don’t see how we get to the point of a user friendly manual.

Maybe it won’t be so much extra effort that it becomes impossible for hobbyists, but - on the whole - the future of the European internet belongs to big players.

This is what’s inherently disturbing to me. I am one of those hoping that the GDPR would be a tool for the opposite (a way to rein in the big players, so to speak).

People don’t know the law and just chose to believe a happy fantasy.

It was a surprise to read from the former DPO worker that email as a system is not compliant with the GDPR.

I believe, there is no way - at present - that an ordinary person can maintain an internet presence while being compliant with GDPR and other regulations.

Hmm. I am starting to see why you take this view. Not saying I agree, but I can understand the frustration. That said, PIPEDA in Canada came to pass in 2000 - it’s considered to have GDPR-equivalency and we’ve not had the sort of issues that you are raising with PIPEDA, which makes me optimistic that the GDPR can likewise be something that folks can live with.

The GDPR is a terrible mistake, but that’s not what people want to hear.

Even if it is flawed it’s still a step in the right direction IMVHO. I’m in Canada, which had PIPEDA back in 2000 - 18 years before the GDPR took effect in the EU. Hence I believe a solution is workable and a balance can be struck - even if in the worst case that means additional legislation to tweak the existing law. (Though I’d not even go that far - for example, from the former DPO, it seems that if EU courts all agreed that the API behind federation was covered by the “involuntary data transfer” exception then Lemmy would already be GDPR compliant (or mostly so) as-is of the time that I write this.)

@General_Effort · 6 months ago

Agreed, but - while it might be permissible legally to wipe out my data and content, what if I want to retrieve a copy afterwards?

You have the right to request a copy of all your personal data from whoever controls it. Apparently that feature is still missing from lemmy.

Well, in that case, baring credible contradicting information from another source, I think it’s reasonable to accept the note from the former worker of a DPO. Would you agree?

That quote is from here: https://lemmy.world/post/1060627

I think I agree with pretty much everything they wrote. From what I understand, the apostrophes indicate that this is not official jargon. You can’t prevent web-scraping with any reasonable effort, so you don’t have to. The internet already exists. It’s too late to stop it now; better focus on stopping future progress.

Mind that there is nothing involuntary about federation. It’s not like web-scraping in that respect. You can just turn it off. You are left with something like an old school forum or reddit. No problem.

Hmm. Will need a good think about this - perhaps I should adjust my commenting style to avoid direct quoting and such…

If you take the view that context is a necessary part of your personal data, then merely avoiding quotes is probably not enough. Practically, the way reddit is doing things seems to be fine.

That’s why I had the idea of creating and using the federation-bot account - this way there’s no confirmation of identities or transfer of personal data.

But what if someone wants to participate in a community on a different instance? At least, the texts and their context, along with the username and home instance, need to be revealed.

Taking a mental step back, it’s probably premature to worry about technological implementations. Sending data around does not have to be a violation. Compliance will require partly better information, and partly different administration. The legal aspects should be worked out before the necessary tools for the administrators are implemented.

There are also a lot of regulation for the backend, that instance owners have to comply with but which won’t be noticed by users. Documenting the data processing, who has access, possibly make data impact assessments, maybe notify the local data protection office, … There’s also more from the DSA, like releasing transparency reports on moderation twice a year, making regular backups and testing those, … I’m not quite sure what all is demanded by the DSA. Oh, and by german law there also needs to be a (physical) address that can be served legal papers.

Hmm - if different DPOs can’t agree, then I don’t see how we get to the point of a user friendly manual.

I’m thinking about the issue of web-scraping, in particular. Some say that it’s almost always illegal. The European Commission, for one, disagrees.

I pulled this from google: https://www.morganlewis.com/pubs/2024/05/eu-regulator-adopts-restrictive-gdpr-position-on-data-scraping-impacting-ai-technologies

Web-scraping is in some ways related. You could also get (almost all of) the data through scraping. If it’s not legal to scrape lemmy without permission, then it’s probably not legal to spin up your own instance and get the data that way. It depends on your purpose, of course.

That’s also why I find the whole issue a little silly. Someone outside Europe could just scrape the data from the web interface and not worry about the GDPR. You’d have to put all of Europe behind a firewall to make it make sense. That’s a prime example of why I say the people in charge of the GDPR have no idea of the technology they are regulating.

This is what’s inherently disturbing to me. I am one of those hoping that the GDPR would be a tool for the opposite (a way to rein in the big players, so to speak).

Such regulation inherently favors big players. The cost of creating a compliant service/app/etc is fairly constant, regardless of the size of the user base.

Besides, the GDPR inherently favors elites. Has anyone ever tracked your private jet on twitter? Or chased after you to get paparazzi pictures? Some people’s personal data is worth a lot more than that of others. Most people will never have to worry about scrubbing unflattering media stories from search engines, or have the money to hire professionals to do it right.

Even if it is flawed it’s still a step in the right direction IMVHO. I’m in Canada, which had PIPEDA back in 2000 - 18 years before the GDPR took effect in the EU.

Tell me what you hope the GDPR will achieve and I’ll tell you if there is any chance. I’d write what the fundamental problems are, but time is short.

Texas SCOPE act takes effect Sunday. How does Lemmy comply?

Texas SCOPE act takes effect Sunday. How does Lemmy comply?

What is the SCOPE Act? New law expected to take effect Sunday aims to add extra layer of protection to kids online