Search code examples
pythonpandasinfluxdbinfluxdb-python

How to retrive more than 10k lines from InfluxDB using Pandas?


I am trying to use InfluxDB's Python client's to retrieve data stored on InfluxDB, but can't more than 10k lines. The examples I am (unsuccessfully) following are here. In summary:

import influxdb
dfclient = influxdb.DataFrameClient('localhost', 8086, 'root', 'root', 'mydb')
q = "select * from some_measurement"
df = dfclient.query(q, chunked=True)  # Returns only 10k points

The issue seems to be related to InfluxDB's internal limitations documented here (namely, the max-row-limit configuration option). I am going through the sources to try to find out how to get a DataFrame larger than 10k lines, but any help in solving this issue would be highly appreciated.


Solution

  • The problem is caused by the DataFrameClient's query simply ignoring the chunked argument [code].

    The workaround I found out is not use the standard InfluxDBClient instead. The code shown in the question becomes:

    import influxdb
    client = influxdb.InfluxDBClient('localhost', 8086, 'root', 'root', 'btc')
    q = "select * from some_measurement"
    df = pd.DataFrame(client.query(q, chunked=True, chunk_size=10000).get_points())  # Returns all points
    

    It is also worth highlighting that from v1.2.2 the max-row-limit setting (i.e. the default value for chunk_size in the above code) has been change from 10k to unlimited.