Search code examples
pythonclickhousememory-efficientpython-db-api

How to fetch data from Clickhouse in dicitionary/name-tuple using clickhouse-driver (python)?


When we fetch data using the DB API 2.0 cur.execute("select * from db.table") we get a cursor which seems like a generator object of list of tuples.

Whereas in pymongo, when we fetch we get it as list of dictionaries. I wanted to achieve something like this.

Instead of fetching list of tuples, I wanted list of dictionaries or a named tuple.

I believe from an efficiency point of view it makes sense, since the schema is already defined so no need to send it for every record.

Currently the workaround I am using is:

cur.execute("select * from db.table")
columns = cur.columns_with_types
data = cur.fetchall()
df = pd.DataFrame(data,columns=[tuple[0] for tuple in columns])
data_reqd = df.to_dict('records')

This methods fairs poorly when query returns a lot of data.

Workaround 1: Use fetchmany(size=block_size) but it doesn't seem like an elegant way to do things.

Workaround 2: This seems like a much better way to handle things.

cur.execute("select * from db.table")
columns = cur.columns_with_types
for tup in cur:
     row = dict(zip(columns, tup))
    # use row

Any good way to handle this? Any improvements to the question are appreciated.


Solution

  • You can alternatively create a Client and call its query_dataframe method.

    import clickhouse_driver as ch
    
    ch_client = ch.Client(host='localhost')
    df = ch_client.query_dataframe('select * from db.table')
    records = df.to_dict('records')