Search code examples
pythonpandasxlsx

How to iterate over another xlsx when usecols is missing


Here is where in need your help.

I have multiple xlsx files and I am looking for the same columns information inside each one. Until now all was working fine but some *.xlsx files are not containing the data and my python script just stop but looking over the others.

import glob
import pandas as pd

# Setup variables
xlsx_input = 'D:\\script\\bdd\\xlsx\\*.xlsx'
csv_output = 'D:\\script\\bdd\\csv\\'

# Save all file matches: xlsx_files
xlsx_files = glob.glob(xlsx_input, recursive=True)
# Create an empty list: frames
frames = []
#  Iterate over xlsx_files
for file in xlsx_files:
    #  Read xlsx into a DataFrame
    df = pd.read_excel(file , usecols=['ref_01','ref_02','ref_03'])
    # Append df to frames
    frames.append(df)

# Concatenate frames into dataframe
excel_output = pd.concat(frames)
# Write CSV file
excel_output.to_csv ((csv_output +"bdd_export.csv"), encoding='utf-8-sig', index=None)

Any help would be greatly appreciated.

Cheers !


Solution

  • Ok, I have found how to do it.
    Just by adding this:
    df = pd.read_excel(file , usecols=lambda c: c in ['ref_01','ref_02','ref_03'])