Search code examples
databasetime-seriesquestdb

Migrating data from Influx to QuestDB


I have a measurement in influx with ~300m rows of data. How can I best get this data from Influx into Quest?

I know QuestDB is compatible with ILP when streaming data, but what I need is to get the data I already have in influx and send into QuestDB. If I want to do this as a batch operation rather than streaming, QuestDB supports CSV for importing data.

I thought of doing an export using the flux command line or the API, but as seen at other questions, that is not very efficient, and I am afraid it will not work fine with bigger tables. Using the API is another option, but then I need to paginate over results, worry about error control, be careful if new data is flowing into the table while reading...

I was wondering if there is a better approach to migrate data from influx into QuestDB.


Solution

  • Since QuestDB is compatible with the Influx protocol, a possible way of migrating using tools from the influx ecosystem is directly following the instructions for migrating data across influxDB OSS instances, but with a twist.

    The general process is exporting data from the source instance using the influxd cli, then using the influx cli to load data into the target instance. That would work, but the data would be sparsely populated into QuestDB, as values exported will have a line per timestamp and measure, while in QuestDB you want to have a timestamp and all the measures for the same table on a single insert. There is another post that deals with a possible way to convert this into a dense format using telegraf.

    We can combine both the influx command line tools and telegraf to do the migration. First we export the data, as in:

    influxd inspect export-lp --bucket-id javier  --engine-path ~/.influxdbv2/engine   --output-path output.lp start 2022-01-01T00:00:00Z --end 2022-01-31T23:59:59Z --compress
    

    Now we can create a telegraf configuration in which we will use the influxdb_v2_listener as the input and the influxdb_v2 output, with an aggregator.merge in-between to convert sparse measurements to dense rows.

    I create a telegraf config file named sparse_to_dense.conf

    [agent]
      omit_hostname = true
    
    # Accept metrics over InfluxDB 2.x HTTP API
    [[inputs.influxdb_v2_listener]]
      ## Address and port to host InfluxDB listener on
      ## (Double check the port. Needs to be available)
      service_address = ":8099"
    
     # Merge metrics into multifield metrics by series key.
    [[aggregators.merge]]
     period = "876000h"  # Set the period to 100 years (876000 hours)
      grace = "876000h"  # Set the grace period to 100 years (876000 hours)
      drop_original = true
    
    # this is the questdb destination
     [[outputs.influxdb_v2]]
      urls = ["http://127.0.0.1:9000"]
      content_encoding = "identity"  # Important to ensuring no gzip encoding
    

    Note in that file I am using the period and grace params with very high values, so it will cover the whole time range.

    With this configuration we can now start telegraf:

    telegraf --config sparse_to_dense.conf --debug
    

    So at this moment we have a telegraf process that will accept ILP rows in port 8099, will merge by timestamp and will output as ILP on port 9000, which is the default for QuestDB.

    With this, we can use the write cli to send the data from the ilp file into telegraf.

    influx write --bucket javier --file output.lp  --host http://localhost:8099
    

    You should see the debug output from telegraf merging the rows, and on QuestDB logs you should see all the write activity.