Trying to understand the different selfhosted monitoring solutions

dr_robot · 2 years ago

Trying to understand the different selfhosted monitoring solutions

@vegetaaaaaaa · edit-2 2 years ago

I am more interested in being able to observe metrics for each node individually rather than in aggregate.

This requirement makes me think netdata would be a good solution. In my current setup, each host has its own netdata dashboard and manages its own health checks/alarms. I have also enabled streaming which sends metrics from all hosts to a “parent/master” netdata instance from which I can see all metrics from all hosts without checking each dashboard individually.

However, it looks like it does not store the metrics for very long.

I still have to look into this, in the past it was certainly true and you had to setup a prometheus instance to store (and downsample, who needs few-seconds resolution for one year old metrics) metrics for long-term archival - but looking at the documentation right now, it looks possible to store long-term metrics in the netdata DB itself, by moving old metrics to a lower-definition storage tier: https://learn.netdata.cloud/docs/configuring/optimizing-metrics-database/change-how-long-netdata-stores-metrics

An important additional advantage is that it comes packaged on Debian (all my machines run Debian).

Same. However I install and update it from their third-party APT repository - it’s one of the rare cases where I prefer upstream releases to Debian stable packages, the last few upstream releases have been really nice (for example I’m not sure the new tiered retention system is availabel in the v1.37.1 Debian stable package)

My automated installation procedure (ansible role) is here if you’re interested (start at tasks/main.yml and follow the import_tasks).

dr_robot · 2 years ago

Thanks a lot for these tips! Especially about using the upstream deb.