Looking to maybe self host my own instance, I’m still learning about the fediverse. If a different instance that I federate with hosts something illegal are there risks to me? Is anything from other instances hosted on my server like a copy of it? Or would I only end up hosting things my users post? I’m paranoid and sorry if this is a silly question.

  • @[email protected]
    link
    fedilink
    English
    1111 months ago

    I’m running it in the smallest VPS of vultr with 25GB of disk.
    This instance only has 3 users, with me being the only active. It says it’s been up for almost a month and I’ve only used 3GB.

    Here are the docker volumes which have the actual data of your instance, and from inside the DB the biggest table is the one called activity which the devs said it’s only sometimes used to validate the data, but could be truncated if needed (there’s a schedule task which only keeps up to 6 months).
    Also the thing to have in mind is to properly configure the logs of whichever installation guide you follow.
    After that I’ve seen other admins say the next biggest is the media uploaded (from bigger instances).

    $ du -h --max-depth=1
    640K    ./pictrs
    3.2G    ./postgres
    3.2G    .
    
    lemmy=# select
      table_name,
      pg_size_pretty(pg_relation_size(quote_ident(table_name))),
      pg_relation_size(quote_ident(table_name))
    from information_schema.tables
    where table_schema = 'public'
    order by 3 desc;
             table_name         | pg_size_pretty | pg_relation_size
    ----------------------------+----------------+------------------
     activity                   | 2187 MB        |       2292867072
     comment                    | 56 MB          |         58212352
     person                     | 48 MB          |         50307072
     comment_like               | 45 MB          |         47161344
     post_like                  | 22 MB          |         22781952
     comment_aggregates         | 14 MB          |         14811136
     post                       | 13 MB          |         13623296
    
    • gabe565
      link
      fedilink
      English
      611 months ago

      The activity table is also used to deduplicate incoming federation data, so instead of truncating it, I’d suggest deleting rows after a certain amount of time.

      For my personal instance, I set up a cron to delete entries older than 3 days, and my db is only ~500MB with a few weeks of content! I also haven’t seen any duplicated posts or comments. Even with Lemmy’s retries, 3 days seems to be long enough before dropping rows from that table.

      • @[email protected]
        link
        fedilink
        English
        2
        edit-2
        11 months ago

        Could you share the cron/script you use to do this? I’m interested in hosting my own Lemmy at some point, and having a script for that cleanup would be hugely helpful for me.

        • gabe565
          link
          fedilink
          English
          211 months ago

          Definitely! I’m hosting in Kubernetes so I won’t post the full thing, but here’s the actual command that I run hourly. Make sure to replace the values for database, username, and password.

          PGPASSWORD=password psql --dbname=database --username=username --command="DELETE FROM activity WHERE published < NOW() - INTERVAL '3 days';"
          
          • @[email protected]
            link
            fedilink
            English
            111 months ago

            Awesome, that was just as straightforward as I was hoping it was, thanks! I am more familiar with MySQL as I haven’t used Postgres a ton but SQL is SQL after all lol

            • gabe565
              link
              fedilink
              English
              211 months ago

              You’re welcome! Makes sense. They’re somehow so similar yet so different lol

      • @[email protected]
        link
        fedilink
        English
        111 months ago

        Hi - can you help me set this up or share the script that you use to do this? Many thanks :)

      • @[email protected]
        link
        fedilink
        English
        111 months ago

        Ah! I didn’t know exactly what was being used for.
        Yeah, then it can only be trimmed, not truncated.

      • @Thief
        link
        English
        111 months ago

        deleted by creator

      • @Thief
        link
        English
        1
        edit-2
        11 months ago

        Can you help me set this up also or share the script I would run to do this? Many thanks.

        • gabe565
          link
          fedilink
          English
          111 months ago

          Sure! My script will look a little different since I’m hosting Lemmy in Kubernetes, but basically you will want to run the following command hourly. Make sure to replace the values for database, username, and password.

          PGPASSWORD=password psql --dbname=database --username=username --command="DELETE FROM activity WHERE published < NOW() - INTERVAL '3 days';"
          
    • 𝙚𝙧𝙧𝙚
      link
      fedilink
      English
      2
      edit-2
      11 months ago

      How are you keeping your pictrs directory so small?

      Mine is at about 5GB after two weeks with just a single user. 😬

        • 𝙚𝙧𝙧𝙚
          link
          fedilink
          English
          111 months ago

          Did you configure the pictrs API keys for Lemmy and for pictrs?

          If they’re not configured then I could see Lemmy not even using pictrs.

          • @[email protected]
            link
            fedilink
            English
            111 months ago

            Ohh!!
            That’s what’s happening, I haven’t uploaded any pictures so I didn’t noticed, aside from that I’m not sure what are the other use cases of pictrs

            • 𝙚𝙧𝙧𝙚
              link
              fedilink
              English
              111 months ago

              Don’t quote me on it but I think it, besides handling image uploads, caches thumbnails for link posts.

      • codus
        link
        fedilink
        English
        111 months ago

        I also have around 3GB used for pictrs and I’m not really sure the best way to see what all content is in there.

        • 𝙚𝙧𝙧𝙚
          link
          fedilink
          English
          211 months ago

          Yeah I haven’t uploaded any images on my instance myself. So none of those images are mine. Might do some reading tomorrow and see if there’s any mention of this in the past on other communities. It’s not an emergency but I’m curious.

            • 𝙚𝙧𝙧𝙚
              link
              fedilink
              English
              211 months ago

              I had found an old post which indicates that post thumbnails are cached. So I guess there’s that.

              In case you didn’t see it, the OP of this thread realized they didn’t setup their pictrs API key… so I guess it’s possible to omit that and lemmy should still work. Not sure about the downsides.