Search code examples
pythonserverjupyter-notebookpickleconcatenation

Open multiple pickle files from Jupyter notebook folder doesn't work


I'm using jupyter notebook on a server (the folders are not on my computer). I have folder with 30 dataframes pickled that have exactly the same columns. They are all saved in the next path:

Reut/folder_no_one/here_the_files_located

I want to open them all and to concat them. I know I could do something like this:

df1=pd.read_pickle('table1')
df2=pd.read_pickle('table2')
df3=pd.read_pickle('table3')
...
#and then concat

but i'm sure there is better and smarters way to do so. I have tried to open all the files and save them seperatly as following:

num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'

{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}

but I get error with this:

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) in ----> 1 {f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}

TypeError: 'str' object is not callable

I have tried to play and o put different version of the path, also not to put path (because my notebook is where those files are) but I keep getting the same error.

*it's important to mention that when I can open those files without specifiying the path when the notebook is inside that folder as well.

My end goal is to open and concat all those tables as one big table automatically.

edit: I have also tried this:

path = r'file_name/file_location_with_all_pickles'
all_files = glob.glob(path + "/*.pkl")

li = []

for filename in all_files:
    df = pd.read_pickle(filename)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

and also

path_to_files = r'file_name/file_location_with_all_pickles'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pkl"):
    print(table)
    tables.append(pd.read_pickle(table))

but both cases I get error

ValueError: No objects to concatenate when I try to concat. also when I tell it to print the filename/table it does nothing. also if inside the loop I try to print just ordinary string (like print('hello'), nothing happens. it seems like there is problem with the path but when I open one specific pickle like this:

pd.read_pickle(r'file_name/file_location_with_all_pickles/specific_table.pkl')

it opens.

'update:

this worked for me inthe end:

import pandas as pd
import glob

path = r'folder' # use your path
all_files = glob.glob(path + "/*.pkl")

li = []

for filename in all_files:
    df = pd.read_pickle(filename)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

from here (Open multiple pickle files from Jupyter notebook folder doesn't work)


Solution

  • How about:

    path_to_files = r'Reut/here_the_files_located'
    df = pd.concat([pd.read_pickle(f'{path_to_files}/table{num}.pickle') for num in range(1, 33)])
    

    This is equivalent to:

    path_to_files = r'Reut/here_the_files_located'
    tables = []
    for num in range(1, 33):
        filename = f'{path_to_files}/table{num}.pickle'
        print(filename)
        tables.append(pd.read_pickle(filename))
    
    df = pd.concat(tables)
    

    Output:

    Reut/here_the_files_located/table1.pickle
    Reut/here_the_files_located/table2.pickle
    Reut/here_the_files_located/table3.pickle
    Reut/here_the_files_located/table4.pickle
    Reut/here_the_files_located/table5.pickle
    Reut/here_the_files_located/table6.pickle
    Reut/here_the_files_located/table7.pickle
    Reut/here_the_files_located/table8.pickle
    Reut/here_the_files_located/table9.pickle
    Reut/here_the_files_located/table10.pickle
    Reut/here_the_files_located/table11.pickle
    Reut/here_the_files_located/table12.pickle
    Reut/here_the_files_located/table13.pickle
    Reut/here_the_files_located/table14.pickle
    Reut/here_the_files_located/table15.pickle
    Reut/here_the_files_located/table16.pickle
    Reut/here_the_files_located/table17.pickle
    Reut/here_the_files_located/table18.pickle
    Reut/here_the_files_located/table19.pickle
    Reut/here_the_files_located/table20.pickle
    Reut/here_the_files_located/table21.pickle
    Reut/here_the_files_located/table22.pickle
    Reut/here_the_files_located/table23.pickle
    Reut/here_the_files_located/table24.pickle
    Reut/here_the_files_located/table25.pickle
    Reut/here_the_files_located/table26.pickle
    Reut/here_the_files_located/table27.pickle
    Reut/here_the_files_located/table28.pickle
    Reut/here_the_files_located/table29.pickle
    Reut/here_the_files_located/table30.pickle
    Reut/here_the_files_located/table31.pickle
    Reut/here_the_files_located/table32.pickle
    

    A few comments about your code:

    num=list(range(1, 33)) #number of tables I have in the folder
    path_to_files=r'Reut/here_the_files_located'
    Path=r'Reut/folder_no_one/here_the_files_located'
    
    {f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}
    
    num=list(range(1, 33)) #number of tables I have in the folder
    

    There is no need to create a list with range. Using range directly in a for loop or list/dictionary comprehension works perfectly.

    Path=r'Reut/folder_no_one/here_the_files_located'
    

    I'm guessing that you have previously imported the Path class from pathlib. You need to choose another name for that variable if you want to call Path like normal. This is why you got the error TypeError: 'str' object is not callable.


    is there nay way to use it if the tables names' are not the same? e.g if one was table1 and one is dataframe3, just to read them not depended on their name

    Sure. Assuming the filenames for all your saved tables end with .pickle, you can use the glob method like you first tried. Don't forget to import pathlib.

    import pathlib
    path_to_files = r'Reut/here_the_files_located'
    tables = []
    for table in pathlib.Path(path_to_files).glob("*.pickle"):
        tables.append(pd.read_pickle(table))
    
    df = pd.concat(tables)