I have the code:
df_mean_woman = df_mean_woman.rename(index = {"Less than 1 year":0}, inplace = True)
df_mean_woman
And when I run it I get the error
AttributeError Traceback (most recent call last)
<ipython-input-136-94a5cc6acf63> in <module>
----> 1 df_woman = df_woman.rename(index = {"Less than 1 year":0},
2 #"More than 50 years":int(51)},
3 inplace = True)
4 df_woman
AttributeError: 'NoneType' object has no attribute 'rename'
Although the error goes away when I simply type df_mean_woman.rename(index = {"Less than 1 year":0}, inplace = True)
But I cannot simply do that because I need to call df again later. I have tried doing quite a few things to fix this, but nothing seems to work. I do not think it is because "Less than 1 year" is not spelled correctly. My main issue seems to be that when I print out df_mean_woman (before the rename) it is said that df does not exist.
When I rerun Juptyr I am able to print out df but all that gets printed is 'None'.
My full code is
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv('data.csv')
%matplotlib inline
df_new = df.copy()
df_new = df_new.drop(['Age1stCode','CompTotal','Respondent', 'MainBranch', 'Hobbyist', 'Age', 'CompFreq', 'Country', 'CurrencyDesc', 'CurrencySymbol', 'DatabaseDesireNextYear', 'DatabaseWorkedWith', 'DevType', 'EdLevel', 'Employment', 'Ethnicity', 'JobFactors', 'JobSat', 'JobSeek', 'LanguageDesireNextYear', 'LanguageWorkedWith', 'MiscTechDesireNextYear', 'MiscTechWorkedWith', 'NEWCollabToolsDesireNextYear', 'NEWCollabToolsWorkedWith', 'NEWDevOps', 'NEWDevOpsImpt', 'NEWEdImpt', 'NEWJobHunt', 'NEWJobHuntResearch', 'NEWLearn', 'NEWOffTopic', 'NEWOnboardGood', 'NEWOtherComms', 'NEWOvertime', 'NEWPurchaseResearch', 'NEWPurpleLink', 'NEWSOSites', 'NEWStuck', 'OpSys', 'OrgSize', 'PlatformDesireNextYear', 'PlatformWorkedWith', 'PurchaseWhat', 'Sexuality', 'SOAccount', 'SOComm', 'SOPartFreq', 'SOVisitFreq', 'SurveyEase', 'SurveyLength', 'Trans', 'UndergradMajor', 'WebframeDesireNextYear', 'WebframeWorkedWith', 'WelcomeChange', 'WorkWeekHrs', 'YearsCodePro'], axis = 'columns')
df_new = df_new.dropna()
df_new
df_woman = df_new.drop(index=df_new[df_new['Gender'] != 'Woman'].index, inplace=True)
df_woman = df_new
df_woman = df_woman.drop(['Gender'], axis ='columns')
df_news = df_new.copy()
df_woman = df_woman.rename(index = {"Less than 1 year":int(0)},
#"More than 50 years":int(51)},
inplace = True)
df_woman['YearsCode'] = df_woman['YearsCode'].apply(lambda x: '{0:0>2}'.format(x))
df_mean_woman = df_woman.groupby('YearsCode')['ConvertedComp'].mean().sort_index()
df_mean_woman
It looks like you are excluding more columns then you are including, so it would be easier to make a list of the columns you want rather than a much longer list of the columns you want to drop.
Overall, I would not use drop
and would instead use loc
for most of these operations. It is also unclear why you are trying to manipulate the index rather than the column values.
# looks like stackoverflow survey data
df = pd.read_csv('survey_results_public.csv')
unwanted = {'Age1stCode','CompTotal','Respondent', 'MainBranch', 'Hobbyist', 'Age', 'CompFreq', 'Country',
'CurrencyDesc', 'CurrencySymbol', 'DatabaseDesireNextYear', 'DatabaseWorkedWith', 'DevType',
'EdLevel', 'Employment', 'Ethnicity', 'JobFactors', 'JobSat', 'JobSeek', 'LanguageDesireNextYear',
'LanguageWorkedWith', 'MiscTechDesireNextYear', 'MiscTechWorkedWith', 'NEWCollabToolsDesireNextYear',
'NEWCollabToolsWorkedWith', 'NEWDevOps', 'NEWDevOpsImpt', 'NEWEdImpt', 'NEWJobHunt', 'NEWJobHuntResearch',
'NEWLearn', 'NEWOffTopic', 'NEWOnboardGood', 'NEWOtherComms', 'NEWOvertime', 'NEWPurchaseResearch',
'NEWPurpleLink', 'NEWSOSites', 'NEWStuck', 'OpSys', 'OrgSize', 'PlatformDesireNextYear',
'PlatformWorkedWith', 'PurchaseWhat', 'Sexuality', 'SOAccount', 'SOComm', 'SOPartFreq', 'SOVisitFreq',
'SurveyEase', 'SurveyLength', 'Trans', 'UndergradMajor', 'WebframeDesireNextYear', 'WebframeWorkedWith',
'WelcomeChange', 'WorkWeekHrs', 'YearsCodePro'}
# no need to copy dataframe before selecting columns
df_new = df.loc[:, list(set(df.columns) - unwanted)]
# use .loc to make df_woman
df_woman = df_new.loc[df_new['Gender'] != 'Woman', df_new.columns.drop('Gender')]
# convert strings to numeric values
df_woman['YearsCode'] = df_woman['YearsCode'].str.replace('Less than 1 year', '0')
df_woman['YearsCode'] = df_woman['YearsCode'].str.replace('More than 50 years', '51')
df_woman['YearsCode'] = pd.to_numeric(df_woman['YearsCode'], errors='coerce').fillna(0).astype(int)
# now groupby and analyze
df_woman.groupby('YearsCode')['ConvertedComp'].mean()