I’m looking into setting up some monitoring combined with simple automation for my selfhosting. Currently I was thinking about using Zabbix.
I want to:
Track bandwidth usage on a router/fw and on a managed switch and track cpu/ram/disk usage on my vms.
Simple monitoring (up/down/maintenance) on the router, switch, my vms as well as on linux services (jellyfin/forgejo/etc) and windows services (lab for studying work-related tools).
I’m also interested in doing simple https checks on my webuis (i’ve had a service running but the website returning both 403 and 404 before) and testing nslookup on my internal dns (if the service is up but the lookups timeout I still want to try restarting the service).

Is there any FOSS/FLOSS alternatives that I should look into before diving into Zabbix?

  • Max-P
    link
    fedilink
    English
    86 months ago

    Prometheus/VictoriaMetrics/Grafana are pretty good, had no issues with it and there’s an exporter for damn near anything. They’re pretty easy to custom write too.

    • @[email protected]
      link
      fedilink
      English
      26 months ago

      But these 3 are all about metrics, right? While they’re great to monitor and analyse numbers (ping times, disk space, memory, etc.), they aren’t that great with e.g. plaintext error messages in log files. That’s how I remember it from a few years ago, at least.

      • @sociableporcupine
        link
        English
        46 months ago

        Grafana/Loki does logs. Still early days for me but it’s solid so far.

    • @anamethatisntOP
      link
      English
      16 months ago

      Cheers! I’ve heard of Prometheus/Grafana but VictoriaMetrics was a new one. Gonna look into it!

      • @[email protected]
        link
        fedilink
        English
        36 months ago

        Yeah VictoriaMetrics is the new favorite since Influx keeps reinventing their wheels and trying to move everyone to the cloud.

        • @keyez
          link
          English
          16 months ago

          May have to explore this, I still run influxdb and telegraf for a push metrics operation instead of pull like prom. Things have been smooth for a while but a couple months ago disk temps and metrics stopped working with no errors or missing plugins