Search code examples
pythontypesimportglob

Python / glob glob - change datatype during import


I'm looping through all excel files in a folder and appending them to a dataframe. One column (column C) has an ID number. In some of the sheets, the ID is formatted as text and in others it's formatted as a number. What's the best way to change the data type during or after the import so that the datatype is consistent? I could always change them in each excel file before importing but there are 40+ sheets.

for f in glob.glob(path):
    dftemp = pd.read_excel(f,sheetname=0,skiprows=13)
    dftemp['file_name'] = os.path.basename(f)
    df = df.append(dftemp,ignore_index=True)

Solution

  • Don't append to a dataframe in a loop, every append relocates the whole dataframe to a new location in memory, very slow. Do one single concat after reading all your dataframes:

    dfs = []
    for f in glob.glob(path):
        df = pd.read_excel(f,sheetname=0,skiprows=13)
        df['file_name'] = os.path.basename(f)
        df['c'] = df['c'].astype(str)
        dfs.append(df)
    
    df = pd.concat(dfs, ignore_index=True)
    

    It sounds like your ID, that's the c column, is a string, but sometimes lacks alphabets. Ideally this should be used as a string.