I'm trying to convert data in order to be able to analyse it and as I'm not very experienced I keep running into problems. I've already received some great advice from the community but once again I'm stumped.
I downloaded a data file from https://www.kaggle.com/datasets/majunbajun/himalayan-climbing-expeditions.
@LancelotduLac was kind enough to fix the first part of the problem for me by showing me how to convert the various reasons for termination into a binary variable
from pandas import read_csv
RE = '^Success.*$'
NRE = '^((?!Success).)*$'
TR = 'termination_reason'
BD = 'basecamp_date'
SE = 'season'
data = read_csv('C:\\Users\\joepf\\OneDrive\\Desktop\\Data analytics course\\Programming1\\CA2\\data\\expeditions.csv')
exp_win_v_fail = data[[TR, BD, SE]]
for v, re_ in enumerate((NRE, RE)):
exp_win_v_fail[TR] = exp_win_v_fail[TR].replace(to_replace=re_, value=v, regex=True)
Then I was trying convert the seasons into categorical variables in order to carry out an ANOVA which has not been going so well
# Turn the season column into a categorical
exp_win_v_fail['season'] = exp_win_v_fail['season'].astype('category')
exp_win_v_fail['season'].dtypes
from scipy.stats import f_oneway
# One-way ANOVA
f_value, p_value = f_oneway(exp_win_v_fail[SE], exp_win_v_fail[TR])
print("F-score: " + str(f_value))
print("p value: " + str(p_value))
I assumed that I would not need to convert the seasons from a str if I converted them into categorical variables but then the console throws up this error message which is making me second guess that assumption:
File "C:\Users\joepf\anaconda3\lib\site-packages\numpy\core\_asarray.py", line 102, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: 'Spring'
Any suggestions would be much appreciated
Figured out how to make it run by changing the seasons into ints
#convert seasons from strings to ints
exp_win_v_fail['season'] = exp_win_v_fail['season'].replace('Spring', 1)
exp_win_v_fail['season'] = exp_win_v_fail['season'].replace('Summer', 2)
exp_win_v_fail['season'] = exp_win_v_fail['season'].replace('Autumn', 3)
exp_win_v_fail['season'] = exp_win_v_fail['season'].replace('Winter', 4)
exp_win_v_fail = exp_win_v_fail[(exp_win_v_fail['season'] != 'Unknown')]
# Turn the season column into a categorical
exp_win_v_fail['season'] = exp_win_v_fail['season'].astype('category')
exp_win_v_fail['season'].dtypes