Search code examples
databaseredisdiffnetwork-monitoring

Redis database snapshot diffs or other suggested DB for network/resource monitoring


I have a monitoring service that polls a REST API for information about the latest resources (list of hosts/list of licenses). The monitoring service cache's all this data in a Redis database. Everything works great for discovering new resources. However the problem I am facing is when a host drops off the network. The challenge I am facing is that I haves no way of knowing that the host has disappeared from the list of hosts. The REST API only gives me a way of querying a list of hosts. One way that I can come up (theoretically) is by taking a diff of the rdb at different time intervals. However this does not seem efficient to me and honestly I am not sure how I would do this with redis.

The suggestions I am looking for are, maybe some frameworks which are best suited for this kind of an operation or if need be a different database that might be as efficient as redis yet gives me the functionality I need to take diffs. Time series databases spring to mind but I have no experience in them and not sure how they can be used to solve this problem precisely.


Solution

  • There's no need to resort to anywhere besides Redis itself - it is robust enough to continue serving your requirements as long as you tell it what to do (like any other software ;)).

    The following is an example but as you didn't specify how you're caching your data, I'll assume for simplicity's sake that you have a key per every host/license in your list where you store some string/binary value, like:

    SET acme.org "some cached value"
    

    You have a lot of such keys because the monitoring REST API returns a list, so a common way to keep everything order is use another key to store that list for each request returned by the API. You can achieve that with a Set:

    SADD request:<timestamp> acme.org foo.bar ...
    

    Sets are particularly useful here because you can perform Set operations, SDIFF and SINTER and store-variants in your case, to keep track of the current online and dropped hosts. For example:

    MULTI
    SINTERSTORE online:<timestamp> request:<timestamp> request:<previous-timestamp>
    SDIFFSTORE dropped:<timestamp> request:<timestamp> request:<previous-timestamp>
    EXEC
    

    Note: as you're caching things it is good practice to expiry values (TTL) to all relevant keys and use an appropriate eviction policy.