I'm aggregating some data before analysis. I've got a dataframe which looks like that :
ID Year Electricity consumption (kWh)
0 102520_1000010472 NaN NaN
1 102521_1000010472 2011 29.0
2 102521_1000010472 2012 29.0
3 102521_1000010472 2013 29.0
4 102521_1000010472 2014 109644556.0
... ... ... ...
The types are as follows :
ID object
Year object
Electricity consumption (kWh) float64
dtype: object
I then pivot the table to get the data in line. Up to there, no problem.
df2 = df.pivot_table(index='ID', columns='Year', values='Electricity consumption (kWh)').round(0) df2 = df2.reset_index()
Year ID 2011 2012 2013 ...
0 102521_1000010472 29.0 29.0 29.0 ...
...
Then I need to call a specific colum. For exemple :
df2['2012']
Python gives me here a KeyError.
I identify this as being a problem with my datatype. I if call the following :
df2[2012]
It does return the expected Series :
0 29.0
1 NaN
2 15.0
3 NaN
4 15.0
But working with this puts me in other difficulties for plotting for instance. I'd like to have the headers being read as strings and not integers using df['2012']
. I tried the following.
1.renaming the headers :
dict = {2010: '2010',
2011: '2011',
...: '...'}
df.rename(columns=dict,
inplace=True)
But of course it doesn't work, as the dict expects strings as a key
2.changing the header type:
I followed the following thread : Converting Pandas header to string type But applying this does raises another error which I don't really understand here.
TypeError: Cannot cast Index to dtype <U0
Any help or suggestion would be gladly appreciated ! And sorry if my explanations aren't cristal clear, my first time posting here and still a big newbie to coding.
I'm using Python 3.8.2 and Pandas 1.4.2
To change the header type, you can call:
df2.columns = df2.columns.astype(str)
If I set up a dataframe to match yours as best I can:
import numpy as np
import pandas as pd
df = pd.DataFrame({ 'ID': ['102520_1000010472', '102521_1000010472', '102521_1000010472', '102521_1000010472', '102521_1000010472'],
'Year': [np.nan, 2011, 2012, 2013, 2014],
'Electricity consumption (kWh)': [np.nan, 29.0, 29.0, 29.0, 109644556.0]}, dtype='object')
df['Electricity consumption (kWh)'] = df['Electricity consumption (kWh)'].astype('float')
print(df.dtypes)
ID object
Year object
Electricity consumption (kWh) float64
dtype: object
Then do the pivot operation as you did:
df2 = df.pivot_table(index='ID', columns='Year', values='Electricity consumption (kWh)').round(0)
df2 = df2.reset_index()
print(df2)
Year ID 2011 2012 2013 2014
0 102521_1000010472 29.0 29.0 29.0 109644556.0
If I check the types of the columns:
for i in df2.columns:
print(type(i))
<class 'str'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
Then do the conversion as suggested:
df2.columns = df2.columns.astype(str)
for i in df2.columns:
print(type(i))
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
Now you can call the dataframe as you wanted:
print(df2['2012'])
0 29.0
Name: 2012, dtype: float64