Search code examples
pythonpandasexceldataframetypeerror

How to analyse a column with pandas that contains '-' [TypeError]


I'm trying to analyse a spreadsheet using pandas. I have multiple sheets in a folder. For the purpose of making this work I am testing it on one spreadsheet for now, which I have named test.

I am trying to find the maximum_change, but my issue is that some of the cells in excel contain -, and not an integer or float.

I have tried to write something to ignore the -, but to no luck, as I still get TypeError

Here is code below for reference:

import os
import pandas as pd

# Define the folder path containing the Excel files
folder_path = r"C:\Users\Documents"

maximum_change = float('-inf')

def update_maximum_values(df):
        global maximum_change
        if 'change' in df.columns:
            maximum_change = max(maximum_change, df['change'].max())

for filename in os.listdir(folder_path):
    if filename.endswith('test.xlsx'):
        file_path = os.path.join(folder_path, filename)

        change_df = pd.read_excel(file_path, sheet_name='Change')
        update_maximum_values(change_df)

        print(f"File: {filename}")
        if 'change' in change_df.columns:
            if 'change' in change_df.columns:
                print(f"Highest value in 'change': {round(change_df['change'].max(), 2)}")
        print()

This works fine when all the cells are filled with numbers

this is the data I'm working with

I did try chatGPT where it suggested a solution using NaN, and to then ignore, but again it didn't really work and I couldn't debug it from there.


Solution

  • Before doing any operation in the loaded dataset, you could try to eliminate the invalid values from the target column.

    change_df['change'] = pd.to_numeric(change_df['change'], errors='coerce')

    This will convert all invalid values in the column into NaN. At this point, calling change_df['change'].max() will fail due to these NaNs. We can just filter them out before calling max, like this:

    change_df['change'].dropna().max()