Hey all. Not sure if this is the right place to post this, please point me in the right direction if not:
So I only came here because of the exodus from reddit, but I’m pumped to see this community and all this technology people have been making. It’s like a return to the old-school, user-operated internet instead of the big awful silos that have been dominating the landscape since the early 2000s. I’m in.
So quick question, are there plans or projects in the works for distributed hosting (making it easier for the users to take up the load of storing and hosting content so the instance operators aren’t stuck with the hosting costs)?
I ask because I’d like to work on a project to implement this, as I feel it’d be a massive further step forward. I’m not sure though if there’s anything existing I should be trying to get up to speed on or if I should be thinking in terms of starting my own project if I want to be working on it.
Lemmy is Federated. You don’t distribute hosting, you have the federation servers communicate with each other.
The best thing you can do is spin up your own instance and convince your friends to use it. That way big communities like https://lemmy.ml/c/asklemmy only has to send your server one update for a post for all your users to view, rather than sending that update to 20 browsers themselves.
So your lemmy.mo_ztt.com instance could serve the one copy of it’s content to your dozen or so users which takes load off of the “main” instance.
“Instance operators” as you termed it… could be literally anyone. You can host is on a raspberry pi for a handful of users easily. This would lighten the loads on the major “Instance operators”.
What’s the right term for what I’m calling an “instance operator”? I realize that anyone could be one, I just need some language to use to distinguish the people who are from the people who aren’t.
Oh I wasn’t chastising your choice in words there. I was just using your term to make it clear that I’m talking about the same thing you were. I also am relatively new here. I’m not sure what the proper lingo is. I would presume “Instance admins”, but I can see how that could be vague or also include people who might not be paying for the actual hosting itself.
Instances are the way to distribute the load, they are basically acting as a read replica for every thing a user on that instance views. Yes, this may be “inefficient” in terms of storage an instance needs, but it is highly efficient in offloading the burden of a popular post to hundreds of instances instead of tens of thousands of users. Further, this makes the system resilient as every instance has a largely real-time copy of the things their users care about, even if the “origin” instance goes offline.
Further, this makes the system resilient as every instance has a largely real-time copy of the things their users care about, even if the “origin” instance goes offline.
This is also a great point. Lemmy.ml is getting hit hard and having sporadic outages. My instance can continue to serve the items it’s received to my users just fine. Effectively no downtime…
I don’t know if ActivityPub has anything to further distribute beyond instances but what you’re talking about reminds me of IPFS and some crypto backed stuff like Filecoin.
Yah, I’m asking because I have a specific (handwave-y) solution in mind using, among other things, IPFS. I’m not too much up to speed on Lemmy’s internals so my solution probably needs big adjustment before it’d be realistic. I planned to make a separate post where I talk at more length on why I think this is needed and some of the ideas I had about solutions; this post was just to get some idea of how the community looks at the issue.
I’ve had similar musings as yours I think. I think the way to make a decentralized community as user friendly as a centralized one would be making the decentralization transparent somehow. One way would require a way for hosters to volunteer computing resources in a way that’s more like adding cattle to a herd rather than pets to a family like in fediverse/matrix/email. More ephemeral and happening in the background. I think the downside is that this is getting closer to peer-to-peer which has a lot of overhead and scaling issues (factorial growth). Federation lies between p2p and client-server but maybe there is room to push it closer to p2p to unlock transparent distribution of resources.
I completely agree with you. I’ve only just started using this Fediverse stuff but my immediate impression has been this is cool but most of this needs to be invisible to end users.
In particular I think things need to be set up to treat instances as potentially temporary. Right now you’re placing a lot of trust in just some random person if you want to try and build up a community. For any number of reasons they could pull the rug out from under you at any point and if that happens your account is gone and (as far as I can tell at least) your content is frozen at the last snapshot it sent. Accounts need to be global or at least mirrored to other instances you interact with. And instances should be able to keep adding posts and comments to that snapshot and sharing the updates to other instances so it’s like it never went down (I don’t think this happens now).
Hm, yah, this a really good point.
So, I thought through what would be a good way to move forward what I talked about and came up with this. Originally I was thinking in terms of trying to have a world-shared data store, so that something like what you’re envisioning would be trivial (if your instance goes down, all your data is still there in the shared data store, so you just use your same user and all your stuff on a different instance transparently), but then I scaled it way way back from that plan in order to make it doable.
And actually, depending on how things are structured, it might be possible to do something like what you’re talking about with my approach. If you read my proposal, I’m talking about having data stored on what I call “peers”, and you’d definitely need to have your peer configured so that all your user’s data is mirrored there permanently. I think that that means it might be possible, if your instance went down or you wanted to move to a new one, to grab your user data from your peer and move it to a whole new instance and have it work there in the same way. Same for a community, by the moderators of that community. Maybe. It’s too far ahead of what I understand about Lemmy and what I’ve fleshed out about how the system works for me to say anything about it for sure though.
Yep, I 100% agree. Here’s my write-up of more of the details of my proposed solution. I’m planning to start work on it this week; if you have feedback or want to help, I’d welcome it.
I have been thinking about this a bit. Right now there is not really a way to spread the load out like you mentioned. Anyone can make another instance, but it doesn’t really alleviate any of the stress from another instance. I think it might even add to it, although not as much as adding a bunch of new users would. It would be beneficial to be able to contribute compute power to an instance, but I don’t think that is a realistic goal with the way Lemmy is setup.
Anyone can make another instance, but it doesn’t really alleviate any of the stress from another instance.
This is inaccurate. If you run your own instance… and have 20 users. That’s 20 users that aren’t hitting the main instance. One copy of the content is transmitted from the primary instance to your instance… Those 20 users are then hitting your instance. So instead of the main instance serving 20 people it’s serving to one copy of the content. That is a 20 fold savings in bandwidth, cpu, and ram. The only thing that isn’t saved is disk capacity… since the origin server needs to serve all the content on demand.
Now the 1-2 user instances, yes there’s not much savings there. But once you get to 5-10 it’s already a better deal.
My wording was poor. I ment that currently there is no way to contribute to reducing stress on an instance. Making your own instance might help prevent the problem from getting worse, but it is not the same as adding more cpu power or ram to an instance. If a instance is maxing out on it’s CPU power, currently there is no way to allow other people to help disperse the current load.
On a slightly tangential point, I am not sure how sustainable it is to increase the number of possible users by increasing the number of instances. It is already a frustrating process finding the right instance to join. So imagine when there is 1 instance for every 100 users. With 100k users that is 1000 different instances to sort through. I think there needs to be better ways to scale Lemmy, especially the amount processing power it requires. Lemmy.ml will only be able to scale so big on a single vps instance, or even physical server.
With 100k users that is 1000 different instances to sort through.
Why would you sort through instances? The communities you want to interact with are still on the big instances… Just let the federation do the talking rather than directly communicating to the instance.
I see what you mean with the other point though. In that case people need to step off the lemmy.ml instance and move somewhere else to lighten the current load.
Based on figures I’ve seen from other instances though it doesn’t take all that much cpu/ram to handle a metric boatload of users. The issue seems to be postgres tuning(which could be storage latency/bandwidth) and storage space.
Right, any way you slice it, if you have a reddit-scale operation where the content is served entirely by the instances, then the people who run the instances are paying a reddit-scale hosting bill in aggregate. I saw one estimate that Reddit paid about half a million dollars in hosting bills per month. You hit the nail on the head – adding a hobbyist who’s running their own instance for themselves and maybe a handful of people, does nothing to reduce the load on the big instances. How many of those big instances are there going to be if Lemmy grows to reddit size? Enough to break that half-million dollar aggregate hosting bill into manageable pieces? Probably not. At that point you can’t do it just with hobbyists with their home machines on static IPs anymore.
Or, actually, you can, if you architect the system to make proper use of the hobbyists’ hardware. Obviously there are solutions; what I’m envisioning is a browser plugin that enables someone browsing Lemmy to pull content from the hobbyists even when talking to the big instances (basically decouple “I run an instance” from “I have to pay all the hosting costs for every byte that’s served to someone browsing on that instance” and shift some of the load onto the people who are more in a hobbyist role and aren’t paying for any kind of official hosting but can still send bytes). I have a lot more thoughts on the topic and more full ideas about how it might be solved, I was just trying to get a sense of what the community’s thoughts on it are also.
I fleshed out one proposal for a solution which I’m planning to start working on.