Note that unless you’re a Lemmy instance admin, this doesn’t have much use to you.

Until this package came along, if you wanted a bot that responds to events, you had to manually traverse all comments/posts/whatever at a fixed interval. With this package you can actually react to events directly from the database. It’s implemented in a very efficient way by connecting the package directly to the Lemmy database and using native Postgres features to get the events (LISTEN/NOTIFY if you want to get technical).

The webhooks themselves are inserted into a separate SQLite database (API is coming) and allow for both simple and complex filtering of the incoming data. The system is already in use by two of my bots, @[email protected] and @[email protected] who now both receive the information about being tagged in a comment in seconds (the actual reply takes a little longer, but that’s because of the nature of the bot).

Currently you can be notified about a post or a comment, other types are trivial to include as well.

Let me know what you think!

  • asudox
    link
    English
    610 months ago

    This is awesome. I really like that. Hope it becomes an official optional setting you can turn on or off in the instance config.

  • @solrize
    link
    English
    110 months ago

    I’d rather have fewer bots on Lemmy, but from an implementation pov I wonder whether a pub-sub interface could keep up better with fast updates. Do webhooks make a new outgoing tls connection on every event?

    • Rikudou_SageOP
      link
      fedilink
      English
      210 months ago

      Pub-sub might work for some use cases, but it wouldn’t work at all for mine. I host my bots on AWS Lambda so I don’t pay for anything, unless the code is actually running. So the webhook essentially wakes the virtual machine up and after processing is done, it goes back to sleep.

      Yeah, they make a new ongoing tls connection on every https webhook. Which doesn’t necessarily mean all db events, there’s quite powerful filtering available and everyone should use it, sending a ping for db events you don’t need to seems quite wasteful.

      • @solrize
        link
        English
        2
        edit-2
        10 months ago

        If we maintain the fantasy (and we may as well) of Lemmy someday overtaking Reddit, that can mean 100s of new posts per second that bots might want to inspect. So that’s quite a lot of vm restarts as well as load on the side sending out the webhook queries. I guess this stuff will have been redesigned a few more times by then though, so it is ok. Lemmy at the moment isn’t ready for such volume for many other reasons too.

        • Rikudou_SageOP
          link
          fedilink
          English
          310 months ago

          Well, it stays warmed up some 15 seconds or so, but the important part is you don’t pay for that uptime. And if my bots ever get to 100s of requests per second, I’m gonna have to shut them down, I’m not that rich.

          • @solrize
            link
            English
            1
            edit-2
            10 months ago

            It shouldn’t be hard to handle 100s of requests per sec on a small vm. Where does your server side (the part listening to postgres events) run anyway?

            I’m thinking of e.g. that stupid reddit bot that responds if all the words in someone’s else’s post are in alphabetical order. That isn’t the type of filter you’d normally offer in a webhook API, so the bot has to listen to the “fire hose”. But its outbound traffic won’t be too large.

            From a privacy standpoint I’d also consider a firehose feed preferable to a filtered one. Like if I want to count how many posts per day mention Taylor Swift, I might not want to reveal that interest. So I have to take in an unfiltered feed and do the counting in my own client.

            There is a whole CS topic called Private Information Retrieval (PIR) that revolves around this idea, fwiw. The Wikipedia article about it is ok.

            • Rikudou_SageOP
              link
              fedilink
              English
              210 months ago

              It needs direct access to the db, so in my case it’s on the same vm as my instance.

              It theoretically could be done in the webhook filter, it’s a full (but limited) language, I’d just need to add support for some functions.

              Those are not really webhooks for public use but for the instance admins, so filtering by posts mentioning Taylor Swift should be more than enough. But yeah, you can just send everything to your bot if it can handle that.

              • @solrize
                link
                English
                110 months ago

                Oh I see, thanks. I was imagining this being used by bot authors who don’t want to run actual lemmy instances. This makes sense now, given that you want to run your bots on Lambda. I’d just run them on a VM but that’s just me. Cloudflare Workers seems like another possibility.

                • Rikudou_SageOP
                  link
                  fedilink
                  English
                  210 months ago

                  It’s not only for the Lambda use case, my main motivation was AutoMods - they are very resource intensive currently and need to run very often. What you needed to do until this package, was traversing all new posts and all comments in there to check whether they’re newer than the last post / comment you’ve moderated. Which is a lot of api requests every minute or two, you’re essentially DDOSing yourself. With this, your AutoMod receives the information that a new comment was created and you can fetch the comment in a single (relatively inexpensive) api request, instead of plethora of requests which are all fairly expensive.

                  Whether the webhooks feature is then exposed to other users is really up to each instance admins, I’m thinking of exposing the functionality for my instance’s users when I finish implementing all I have envisioned.

                  Of course bot authors can add support for webhook triggering which means the admins can then use it more effectively.

  • Nerd02
    link
    fedilink
    English
    110 months ago

    Hey this is pretty cool! I wanted something similar for my instance, with a webhook notifying me of any application request, so I can get a notification and react as soon as possible. Well, I ended up having to implement that from scratch within my @[email protected] bot. A solution like this would probably be WAY more efficient than my current setup (with a client continuously polling for new applications). Good stuff!

    • Rikudou_SageOP
      link
      fedilink
      English
      210 months ago

      Should be simple to rewrite the bot to accept input from this! If you plan on doing so, let me know, I’ll add support for the applications table.

      • Nerd02
        link
        fedilink
        English
        110 months ago

        I have a bit too much stuff going on in my life right now to focus on changing my Lemmy stack, I’ll have to stick to my current setup for the time being.

        But I am very much interested in the package. Gonna leave a star on its repo and hopefully I’ll remember to come back to this once my hands are a bit less full than they are now.

  • originalucifer
    link
    fedilink
    110 months ago

    what are the resource use implications of something like this? will it scale well with a large instance?

    • Rikudou_SageOP
      link
      fedilink
      English
      310 months ago

      Yep, it uses pushing from the postgres to the webhook processor instead of polling for data periodically by an app. So after every insert, an event is pushed using the native postgres listen/notify mechanism and then the webhook processor doesn’t interact with the database at all.

      • originalucifer
        link
        fedilink
        110 months ago

        yeah, but im seeing a reference to the doubling of my standing-processing as now i have an insert-after-event that didnt exist before… is that right?

        i mean i get that youre pushing the processing to a different, functional mechanism, but its still additive processing on the server that needs accounting… and seems expanded.

        • Rikudou_SageOP
          link
          fedilink
          English
          110 months ago

          Yeah, everything you do takes processing power. This is done in a way that minimises the impact. There’s no insert-after-event that I’m aware of. Also I’m not sure what you mean by expanded.

          • originalucifer
            link
            fedilink
            010 months ago

            Yeah, everything you do takes processing power. So after every insert, an event is pushed using the native postgres listen/notify mechanism

            right, i was just curious how much processing this is. it gets expensive quick on a large instance, efficiency matters. might be negligible, but i watch my services like a hawk.

            • Rikudou_SageOP
              link
              fedilink
              English
              310 months ago

              This feels like a moot point. I promise you this is much more efficient than the rest of Lemmy is.