I was told that I should post this here.

cross-posted from: https://lemmy.world/post/932750

Say you decide to self-host a Lemmy instance. When you create that instance, do you immediately need to download and store all the data that has ever been posted to all federated Lemmy instances? Or perhaps you only need to download and store everything that is posted to the federated Lemmy instances from that point forward? Or better yet, do you only store what the users on that instance do (i.e. their posts, and posts to the communities hosted on that instance)?

  • @[email protected]
    link
    fedilink
    English
    41 year ago

    It works a lot like like email between instances. Let’s call your self hosted instance “A” and the popular remote instance “B.”

    User on A searches for “poodles” and finds a community !poodles@B. When they click the search results: A sends B mail saying “send me the last 10 posts for poodles.” B sends A mail with the posts and the user sees the posts, but none have comments.

    If nothing else happens then those 10 posts will just hang out doing nothing on A, but if the user clicks subscribe then A sends another mail to B saying “my user wants to follow poodles.” B replies saying “cool, I’ll send you everything from poodles now.” Now, anything a post or comment happens B checks lots list of subscribing instances and sends copies of them.

    If user on A comments on !poodles@B or posts, it creates it on A but sends a mail to B saying “here is some new stuff for poodles!”

    • @KalciferOP
      link
      English
      2
      edit-2
      1 year ago

      Thank you for the explanation!

      Unfortunately, it seems, if I understand understand correcly, that this is not sustainable in the long term for small instances/servers. If Lemmy continues to grow in popularity, then the influx of content will continue to increase, thereby pushing small servers out of participation due to lack of resources. The data storage requirements, I fear, will become a very limiting issue.

      I feel that if servers only tracked what their users directly participated in (i.e. only save comments, and posts directly made by the user), this issue would not be as problematic.

      For example, I would like to host my own instance with only my account on it. I was initially hoping that my data storage requirements would only be directly proportional to how much I, as a user, use Lemmy; the server would only need to store my personally created data, and nothing else. Unfortunately, however, it appears that I would also have to have enough resources to sustain everyone elses posts which is a far steeper requirement.

      • @[email protected]
        link
        fedilink
        English
        31 year ago

        Well, it really comes down to how many subscriptions there are.

        A small instance may only sub to 100 communities, so it is not too bad.

        But on the flip side, it means that the big instancr needs to send everything to a huge number of small instances.

        In practice I do not think it will be too bad, there will be a set of medium sized instances that most will be attracted to, and they will have the p80 of communities subbed. Smaller ones will be for more technical people who will not worry that they need to ensure the content is subbed to, as they will understand how it works.

        I think over time, services that aggregate community details will spring up, and be incorporated into the lemmy search, so it is easier to find things across the entire fediverse, not just your instance. I think there will be a large set of muggle-type user improvements over the next couple of months.

      • @[email protected]
        link
        fedilink
        English
        31 year ago

        Media takes up space. The text from posts and comments is trivial. The database for lemmy.world is only 25 GB. Wikipedia text is only 21 GB.

      • Max-P
        link
        fedilink
        English
        21 year ago

        It’s not quite as bad, because you’re still being pushed what you subscribe to. So while you do indeed get a fair bit of content you might never see, it’s necessary for you to be able to browse those communities and even being able to compute what threads are active/trending/hot/updated or whatever else filter you use. Because that’s all computed locally on your instance.

        It’s also an efficiency advantage: if your instance has a lot of users, having everything locally means that you offer a much smoother experience, and also you’re contributing to the remote instance not being so busy with traffic as you’re not just proxying everything to it and increasing the remote’s load.

        For your storage concerns, there’s nothing preventing you from purging content older than a week or two regularly via a cronjob.

        It’s not that bad so far:

        8,0K    volumes/lemmy-ui
        887M    volumes/pictrs
        646M    volumes/postgres
        1,5G    total
        
        • Lodion 🇦🇺
          link
          fedilink
          English
          01 year ago

          Your instance must be very new, very few users, very inactive… or all of the above. I stood up aussie.zone just under a month ago, Postgres DB is currently 9.6GB.