python pandas multithreading multiprocessing concurrent.futures

How to use multiprocess/multithreading to read csv file and store it in generated new variables?

I have a list of filenames and use it to generate string which will be the new variable to store dataframe.
The code below it's not working.

def filename(name):
    filename = f'{name}.csv'
    return pd.read_csv(filename(name))

with concurrent.futures.ProcessPoolExecutor() as executor:
    files = [
                '20190702',
                '20190703',
                '20190708',
    ]

    # list of stings which will be new variable names
    name_list = ['df_' + i.split('2019')[1] for i in files]

    # list to store new variables
    executor_list = []

    for i in range(len(files)):
        name = name_list[i]
        dataframe = executor.submit(filename, files[i])
        exec(f"{name} = {dataframe}") # Some error here!
        exec(f"executor_list.append({name})")

    for i in executor_list:
        exec(f"{i} = {i.result()}")

I ran this in colab and I got this error:

  File "<string>", line 1
    df_0702 = <Future at 0x7f0e5b8cc3c8 state=running>
              ^
SyntaxError: invalid syntax

Solution

executor.submit returns Future object. So you should retrieve the result from the future object

files = ['20190702', '20190703', '20190708']

futures = {}

with concurrent.futures.ProcessPoolExecutor() as executor:
    for filename in files:
        vname = 'df_' + filename.split('2019')[1]
        filename = filename + '.csv'
        future = executor.submit(pd.read_csv, filename)
        futures[vname] = future

for vname, f in futures.items():
    dataframe = f.result()
    # do something with vname and dataframe

Plus, never use exec or eval function besides for debugging/testing purpose. They make your code insecure and hard to debug.