Search code examples
pythonmetricsrrdtoolgraphiterrd

RRD library for handling time series data in a Python application


I am working on a simulation engine using Python where I collect a lot of metrics. The simulation runs at a high speed and generates around 100K events/second (I can do some processing by consolidating these events on a per second basis). I am looking for a mechanism to record these metrics as a time series.

My requirements are:

  1. I would like to have this logging mechanism in the same process as the simulation as opposed to an external process such as Graphite

  2. The mechanism must be able to handle 100K events/second without slowing down the simulation.

  3. I would like to store data as follows: Every metric related data should be stored with 1 second granularity for 60 minutes, 1 minute granularity for 1 day, 5 minute granularity for two days, 1 hour granularity for 6 months and 1 day granularity for 3 years of duration. I would like this mechanism to handle the consolidation of data as per the ranges specified.

  4. Ideally, I want to maintain one file that holds the metrics information for one simulation run. For another run of the simulation a separate file would have to be created.

  5. It would be nice to have a well-tested library/module that is readily available :)

BTW, I took a cursory look at RRDTool but from what I understand it seems like the Python library is a thin wrapper around the RRDTool binary. I'm looking for a tighter integration if possible.

TIA


Solution

  • The functionality provided by RRDTool fits my requirement. Initially I found a Python library https://pypi.python.org/pypi/python-rrdtool/ and misunderstood the nature of integration. I thought it was executing the binary of RRDTool as a separate process but the documentation says that this is a proper Python accessible wrapper that invokes the functionality in the same process space.

    Later on I found this (https://pypi.python.org/pypi/PyRRD) Python library that wraps RRDTool functionality in a more pythonic OOPS kind of fashion that I found comfortable working with. The documentation available on the link page was good so I faced no roadblocks in using it.

    This link (http://www.vandenbogaerdt.nl/rrdtool/tutorial/rrdcreate.php) was helpful in figuring out how to configure the RRD database during creation.