Yep, but if you reply to said post, only other users on YOUR instance will see those comments. Any other instance and it would be like you didn’t reply at all.
So trying to not go into the technical details too much but when two instances federate with each other, they literally share all of the community, post and comment data with all other federated servers. But it’s the job of the host do manage that passing of data.
Now once the host decides to go offline, that activity of informing all other instances of “hey here’s something new about XYZ community!” no longer happens, but each instance still has the historical data from prior to them going offline. So you can still see that old data and still technically reply to it. Just that the host won’t tell other instances that you did reply.
Lets assume you have an account on Lemmy.world. Let’s also assume you see some post on Lemmy.ml. And finally lets assume you have a friend that’s actually on Mastodon. When you reply to that post on Lemmy.ml, Lemmy.world sends your reply to Lemmy.ml and then Lemmy.ml tells Mastodon (and all other federated instances) about your reply. But if Lemmy.ml decides to go offline, Lemmy.world has no where to send that reply to, so it’s only kept locally on Lemmy.world. The user on Mastodon can’t see it as their instance wasn’t told about it from Lemmy.ml as it went offline.
Correct. Images are actually hosted on a separate service along side the instance itself. So if said instance goes offline, all of the images go along with it (unless you linked to lmgur or something else instead).
So lets say lemmy.world and then lets take beehaw.org (I know they are defederated lets assume they are not) for example. All the posts and comments which are hosted by lemmy.world on hard drives or servers, are also hosted by beehaw.org and vice versa? So the amount of data is actually doubled in size?
Yep. Add in a 3rd instance and now you have 3 copies of the database, essentially. It’s just that each instance is responsible about telling the fediverse when updates occur to communities on their instance.
I would guess it’s A LOT smaller than you’d expect. Especially if you’re just talking about posts and comments and not any uploaded images. The images themselves I can guarantee you is probably many orders of magnitude greater than the size of the conversations.
The post is a few years old and is quoting data that is a few years older still… but assuming that they’ve doubled in size since, there’s only 10TB of data for text, comments, etc… (i.e. no images).
Now I’m assuming this is compressed btw. (The link in the post is dead so I can’t actually check out the file and see what’s in there).
Yep, but if you reply to said post, only other users on YOUR instance will see those comments. Any other instance and it would be like you didn’t reply at all.
OK thanks. I am still a bit confused at how it works tho, if they did nuke the website, where would the data from the post and comments be stored
So trying to not go into the technical details too much but when two instances federate with each other, they literally share all of the community, post and comment data with all other federated servers. But it’s the job of the host do manage that passing of data.
Now once the host decides to go offline, that activity of informing all other instances of “hey here’s something new about XYZ community!” no longer happens, but each instance still has the historical data from prior to them going offline. So you can still see that old data and still technically reply to it. Just that the host won’t tell other instances that you did reply.
Replying to my own post with an example…
Lets assume you have an account on Lemmy.world. Let’s also assume you see some post on Lemmy.ml. And finally lets assume you have a friend that’s actually on Mastodon. When you reply to that post on Lemmy.ml, Lemmy.world sends your reply to Lemmy.ml and then Lemmy.ml tells Mastodon (and all other federated instances) about your reply. But if Lemmy.ml decides to go offline, Lemmy.world has no where to send that reply to, so it’s only kept locally on Lemmy.world. The user on Mastodon can’t see it as their instance wasn’t told about it from Lemmy.ml as it went offline.
I assume its only text content that is shared between servers? Not uploaded images and the like?
Correct. Images are actually hosted on a separate service along side the instance itself. So if said instance goes offline, all of the images go along with it (unless you linked to lmgur or something else instead).
So lets say lemmy.world and then lets take beehaw.org (I know they are defederated lets assume they are not) for example. All the posts and comments which are hosted by lemmy.world on hard drives or servers, are also hosted by beehaw.org and vice versa? So the amount of data is actually doubled in size?
Yep. Add in a 3rd instance and now you have 3 copies of the database, essentially. It’s just that each instance is responsible about telling the fediverse when updates occur to communities on their instance.
If the fediverse gets really big, lets say the size of reddit, it may be hard for all the different instances to store all that data on their servers
Ya, ActivityPub isn’t without it’s issues… but luckily it’s all just text. Much of that can be compressed significantly.
I wonder what the total data storage size is for all the publicly viewable content on reddit. I find it hard to even guess lol. 100TB? 10,000TB?
The compressed archive of reddit from 2005.5 until 2022 is 2 TB: https://academictorrents.com/details/7c0645c94321311bb05bd879ddee4d0eba08aaee
Uncompressed it is likely way larger though.
I would guess it’s A LOT smaller than you’d expect. Especially if you’re just talking about posts and comments and not any uploaded images. The images themselves I can guarantee you is probably many orders of magnitude greater than the size of the conversations.
Btw I did just find this: https://www.reddit.com/r/DataHoarder/comments/pqxs8m/size_of_reddit/
The post is a few years old and is quoting data that is a few years older still… but assuming that they’ve doubled in size since, there’s only 10TB of data for text, comments, etc… (i.e. no images).
Now I’m assuming this is compressed btw. (The link in the post is dead so I can’t actually check out the file and see what’s in there).