Search code examples
pythonpandasdatetimeinfluxdb

Pandas datetime64 to epoch for influxdb line protocol


I'm trying to convert some data from CSV to InfluxDB line protocol.

To get started, the dataset I'm working with looks like this:

date time latency
2021-02-02 16:11:15,486 78
2021-02-02 16:16:15,592 94

I'd like to end up with something like:

some_category latency=78 1612369423.956387

So far I've got the following code but I'm strugging with how to export the datetime object as epoch time as required by the line protocol. I'm a bit confused by pandas documentation here.

import pandas as pd

df = pd.read_csv("processed.csv",
                 delim_whitespace=True,
                 # Treat columns 0 and 1 as a single datetime object.
                 parse_dates=[[0, 1]])

lines = ['some_category'
        + ' '
        + 'latency=' + df['latency'].apply(str)
        + ' '
        + df['date_time'].apply(str)]

Solution

  • To my understanding the final timestamp format with the decimal point specified in the question is in fact not what you're aiming for. If you want to follow influxdb line protocol (refer to influxdb docs) the epoch time should be an integer indicating time in ns.

    Assuming that the default date_parser is being used when the csv file is read, to convert the datetime object into the epoch time you should perform the object conversion to int as follows:

    lines = ['some_category'
            + ' '
            + 'latency=' + df['latency'].apply(str)
            + ' '
            + df['date_time'].astype(int).apply(str)]
    

    after which the contents of the list "lines" should look like this:

    some_category latency=78 1612282275486000000
    some_category latency=94 1612282575592000000
    

    Now if we want to see if this has worked, we can form influxdb line protocol export file test_db.db which is digestible by Influx CLI with the following content:

    # INFLUXDB EXPORT: 1677-09-21T01:06:11+00:53 - 2262-04-12T00:47:16+01:00
    # DDL
    CREATE DATABASE test_db WITH NAME autogen
    # DML
    # CONTEXT-DATABASE:test_db
    # CONTEXT-RETENTION-POLICY:autogen
    # writing wal data
    some_category latency=78 1612282275486000000
    some_category latency=94 1612282575592000000
    

    ...then let's use the import command:

    influx -import -path=test_db.db
    

    & check the influx instance if the data points were written in correctly:

    use test_db
    Using database test_db
    precision rfc3339
    select * from /.*/
    name: some_category
    time                     latency
    ----                     -------
    2021-02-02T16:11:15.486Z 78
    2021-02-02T16:16:15.592Z 94