Search code examples
pythonexcelpandasfeather

Converting excel to feather format with python


I have a (daily growing) list of around 100 big excel files, which I analyse in Python. As I have to run several loops over all the files, my analysis are getting slower and slower. Therefore I'd like to convert all excel files into feather format (like once a week). Is there a clever way to do that? What I have tried so far:

path = r"filepath\*_name*.xlsx"
file_list = glob.glob(path)
for f in file_list:
    df = pd.read_excel(f, encoding='utf-8')
    df[['boola', 'boolb']] = dfa[['boola', 'boolb']].astype(int)
    pathname = f[:-5] + ".ftr"
    df.to_feather(pathname)

But I'm getting the following error message:

ArrowInvalid: ('Could not convert stringa with type str: tried to convert to boolean', "Conversion failed for column stringb with type object")

Solution

  • Here is what solved my problem:

    path = r"pathname\*_somename*.xlsx"
    file_list = glob.glob(path)
    for f in file_list:
        df = pd.read_excel(f, encoding='utf-8', decimal=',', thousands='.')
        for col in df.columns:
                w= (df[[col]].applymap(type) != df[[col]].iloc[0].apply(type)).any(axis=1)
                if len(df[w]) > 0:
    
                    df[col] = df[col].astype(str)
    
                if df[col].dtype == list:
                    df[col] = df[col].astype(str)
        pathname = f[:-4] + "ftr"
        df.to_feather(pathname)
    df.head()
    

    the , decimal=',', thousands='.' part was necessary because my input file was formatted in European standard, i.e. using comma as a decimal separator and a dot as thousands separator