Search code examples
monitoringgrafanatelegraf

How to monitor a systemd service using telegraf?


I created a systemd service that is running in our system and I want to monitor it using a telegraf agent that I already have installed on the instance. The Agent is currently monitoring the basic infra stuff and I need to add monitoring to the new service.

I couldn't find any example on how to do it which is strange, I would expect telegraf to have some sort of plugin for something that basic.

My service is running a python script that doesn't expose any port so I can do a normal HTTP health check.

any help will be appreciated.


Solution

  • So I found that indeed there is a plugin that monitors systems service, The name is systemd_units.

    This is the configuration I've implemented:

    # Gather systemd units state
    [[inputs.systemd_units]]
      ## Set timeout for systemctl execution
       timeout = "1s"
    
      # Filter for a specific unit type, default is "service", other possible
      # values are "socket", "target", "device", "mount", "automount", "swap",
      # "timer", "path", "slice" and "scope ":
      unittype = "service"
    
      # Filter for a specific pattern, default is "" (i.e. all), other possible
      # values are valid pattern for systemctl, e.g. "a*" for all units with
      # names starting with "a"
      pattern = ""
      ## pattern = "telegraf* influxdb*"
      ## pattern = "a*"
    

    After getting the metrics in the influxDB This is the query I used to extract the data I needed:

    from(bucket: "veeva")
      |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
      |> filter(fn: (r) => r["_field"] == "active_code")
      |> filter(fn: (r) => r["_measurement"] == "systemd_units")
      |> filter(fn: (r) => r["active"] == "active")
      |> filter(fn: (r) => r["host"] == "10.192.21.66")
      |> filter(fn: (r) => r["name"] == "myservice.service")
      |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
      |> yield(name: "mean")
      [1]: 
    

    And this is how it looks like in Grafana:

    enter image description here https://docs.influxdata.com/telegraf/v1.22/plugins/#systemd_units