Search code examples
pythonpandastqdm

Can tqdm be used with Database Reads?


While reading large relations from a SQL database to a pandas dataframe, it would be nice to have a progress bar, because the number of tuples is known statically and the I/O rate could be estimated. It looks like the tqdm module has a function tqdm_pandas which will report progress on mapping functions over columns, but by default calling it does not have the effect of reporting progress on I/O like this. Is it possible to use tqdm to make a progress bar on a call to pd.read_sql?


Solution

  • Edit: Answer is misleading - chunksize has no effect on database side of the operation. See comments below.

    You could use the chunksize parameter to do something like this:

    chunks = pd.read_sql('SELECT * FROM table', con=conn, chunksize=100)
    
    df = pd.DataFrame()
    for chunk in tqdm(chunks):
        df = pd.concat([df, chunk])
    

    I think this would use less memory as well.