Search code examples
pythonpandasmultithreadingmultiprocessingconcurrent.futures

How to use multiprocess/multithreading to read csv file and store it in generated new variables?


  • I have a list of filenames and use it to generate string which will be the new variable to store dataframe.
  • The code below it's not working.
def filename(name):
    filename = f'{name}.csv'
    return pd.read_csv(filename(name))

with concurrent.futures.ProcessPoolExecutor() as executor:
    files = [
                '20190702',
                '20190703',
                '20190708',
    ]

    # list of stings which will be new variable names
    name_list = ['df_' + i.split('2019')[1] for i in files]

    # list to store new variables
    executor_list = []

    for i in range(len(files)):
        name = name_list[i]
        dataframe = executor.submit(filename, files[i])
        exec(f"{name} = {dataframe}") # Some error here!
        exec(f"executor_list.append({name})")

    for i in executor_list:
        exec(f"{i} = {i.result()}")

I ran this in colab and I got this error:

  File "<string>", line 1
    df_0702 = <Future at 0x7f0e5b8cc3c8 state=running>
              ^
SyntaxError: invalid syntax

Solution

  • executor.submit returns Future object. So you should retrieve the result from the future object

    files = ['20190702', '20190703', '20190708']
    
    futures = {}
    
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for filename in files:
            vname = 'df_' + filename.split('2019')[1]
            filename = filename + '.csv'
            future = executor.submit(pd.read_csv, filename)
            futures[vname] = future
    
    for vname, f in futures.items():
        dataframe = f.result()
        # do something with vname and dataframe
    

    Plus, never use exec or eval function besides for debugging/testing purpose. They make your code insecure and hard to debug.