Welp that answers a lot of why all .ml are down

BarterClub · edit-2 2 years ago

Welp that answers a lot of why all .ml are down

@LordShrek · 2 years ago

you connect to some lemmy instance on your web browser
the client application (lemmy web app) authenticates your login credentials by first checking its own user database, if it doesn’t find you (which it should because by default you’d be connecting to an instance that you’ve already used, and if done through a mobile app for example it would automatically find the best instance to use by lowest latency), it send out a message to the nodes(instances) that it knows about, searching for your user, recursively, when found, sent back and stored in each node that was part of the searching. (there’d be some threshold of tree depth so the unsuccessful branches don’t keep going forever, and some other algorithmic details to prevent redundant network activity)
you navigate to your subscribed communities feed, lemmy shows you the posts that are already on the node that you are directly connected to, then asynchronously sends out a request to the surrounding nodes to pull more posts from those communities, recursively reaching out to adjacent nodes, again avoiding repeatedly hitting the same node via algorithmic details which we can discuss further if you wish, sending back the info up the tree to your primary node. now a bunch of servers have duplicated community data, like a distributed storage system, but you, the user, don’t know about all that stuff that just happened behind the scenes. your GUI is updated accordingly
now you can interact with these posts, make new posts, and each interaction will be sent out to all the relevant nodes in a reverse process.
another user on the other end can visit some community that you just posted to, and a request will again be propagated through the network, but starting from his node, and eventually reaching some node that has your new post.

the advantages of this:

if a node goes down, not all of the community and user data is lost, because its neighbor nodes have replicated the data
if i am hosting a node, and have limited bandwidth and storage, i can specify limits so that my network is not unintentionally DoSed. so this implies that when the prior-described processes are occurring, some instances will not store the data they are pushing through, which is fine, and one of the intended features of this distributed architecture
similar to previous point, each instance can have a whitelist or blacklist of communities (for either storage and/or data passing), defined by the admin, if he/she wishes to tailor the content for example to keep it related to content they are interested in rather than being forced to serve everyone on the network. it’s like if someone wants to help a little bit but they don’t have all the bandwidth and storage in the world, they can, instead of having to handle traffic for a bunch of irrelevant-to-them communities.

@shrugal · edit-2 2 years ago

There is so much wrong with this that I don’t even know where to begin.

I don’t intent to be rude, but this is just not how you build a decentralized/distributed system. The network would grind to a halt if every user app had to search recursively through a portion of the network, and aggregate & rank posts by itself. Aggregate values (communities, votes and so on) would never be right, because you’d never be able to acually gather all events for a particular entity in time. This might work in a local network of 10 nodes, but not on a global scale.

On top, who would pay for those nodes you are querying? There is no relationship between the users and the nodes, so why would anyone just run a node for others or be willing to pay anyone else in this scenario? Servers cost money and stuff. And your spam filtering and moderation solution would be the exact same as with instances, so nothing is gained here.

Maybe have a look at the Session messenger and their Oxen network. They go to great length to make sure the work is equally distributed among nodes and they are compensated fairly. This doesn’t just happen magically by itself, and there are many bad actors who will try to exploit any weakness they can find.

So I just think it’s impossible to create something like lemmy in an anonymous way, because content moderation is a human decision. There is no one correct mathematical solution, and I also can’t send some kind of filter query to a server to do it for me. All I can do is read the general rules that another human being has wrote up, subscribe to their moderation “service”, see how they are doing, and decide to stay or switch to another.

Similarly, if I don’t want to aggregate all the posts in the world by myself (as you are suggesting), then I’ll have to fine someone to do it for me, and somehow pay that someone for their service. This part is actually kind of solvable (again look at Session), but it is not straight forward at all! It would involve crypto currencies, mining/staking, and some kind of client-side monetization. For this part I think trusted instances are just a much better solution, because we are building a social structure here anyway.

@LordShrek · 2 years ago

ok, you make good points, but i feel like the algorithm could work to not have the system grind to a halt. i’d have to look at other examples where this has been done. but maybe i am overly-optimistic and it’s not possible.

who would pay for those nodes you are querying

the people who are already running nodes, like lemmy.world, lemmy.ml, me, etc. i run some services on my home server that i let anyone use, because i have the hardware and the bandwidth to be able to afford it. there are enough people who have the necessary hardware and bandwidth to contribute to it at minimal detriment to them. it’s already an open-source project where people volunteer their time to code it.

i’ll read up on oxen network.

in an anonymous way

wait who said anything about anonymous? what are talking about being anonymous? there would still be user accounts.

if I don’t want to aggregate all the posts in the world by myself (as you are suggesting), then I’ll have to fine someone to do it for me

this is already what is done, except that the data is not stored in a replicated and distributed manor. you get all the posts in the world of a community of an instance. it is one server, with all the data stored on its harddrive, like a traditional website. in what i’m proposing, this is also what would happen in many cases, because the thing wouldn’t requery the entire network every time you request posts, there would be a time threshold, like how posts are cached on your local mobile device for most social media apps. posts would be cached on the server.

now, yes, this architecture would in fact result in more network traffic occurring between each and every node, as they receive updates about events on other nodes. so that would be extra burden upon the hosts. but i believe it is something we can work through.