I'm trying to analyse a spreadsheet using pandas. I have multiple sheets in a folder. For the purpose of making this work I am testing it on one spreadsheet for now, which I have named test.
I am trying to find the maximum_change, but my issue is that some of the cells in excel contain -, and not an integer or float.
I have tried to write something to ignore the -, but to no luck, as I still get TypeError
Here is code below for reference:
import os
import pandas as pd
# Define the folder path containing the Excel files
folder_path = r"C:\Users\Documents"
maximum_change = float('-inf')
def update_maximum_values(df):
global maximum_change
if 'change' in df.columns:
maximum_change = max(maximum_change, df['change'].max())
for filename in os.listdir(folder_path):
if filename.endswith('test.xlsx'):
file_path = os.path.join(folder_path, filename)
change_df = pd.read_excel(file_path, sheet_name='Change')
update_maximum_values(change_df)
print(f"File: {filename}")
if 'change' in change_df.columns:
if 'change' in change_df.columns:
print(f"Highest value in 'change': {round(change_df['change'].max(), 2)}")
print()
This works fine when all the cells are filled with numbers
this is the data I'm working with
I did try chatGPT where it suggested a solution using NaN, and to then ignore, but again it didn't really work and I couldn't debug it from there.
Before doing any operation in the loaded dataset, you could try to eliminate the invalid values from the target column.
change_df['change'] = pd.to_numeric(change_df['change'], errors='coerce')
This will convert all invalid values in the column into NaN
. At this point, calling change_df['change'].max()
will fail due to these NaN
s. We can just filter them out before calling max
, like this:
change_df['change'].dropna().max()