Dataframe colum header as Object returning "KeyError" using Pandas

I'm aggregating some data before analysis. I've got a dataframe which looks like that :

    ID                  Year    Electricity consumption (kWh)
0   102520_1000010472   NaN     NaN
1   102521_1000010472   2011    29.0
2   102521_1000010472   2012    29.0
3   102521_1000010472   2013    29.0
4   102521_1000010472   2014    109644556.0
...     ...                     ...     ...

The types are as follows :

ID                                object
Year                              object
Electricity consumption (kWh)    float64
dtype: object

I then pivot the table to get the data in line. Up to there, no problem.

df2 = df.pivot_table(index='ID', columns='Year', values='Electricity consumption (kWh)').round(0) df2 = df2.reset_index()

Year    ID                  2011    2012    2013    ...
0       102521_1000010472   29.0    29.0    29.0    ...
...

Then I need to call a specific colum. For exemple :

df2['2012']

Python gives me here a KeyError.

I identify this as being a problem with my datatype. I if call the following : df2[2012] It does return the expected Series :

0       29.0
1        NaN
2       15.0
3        NaN
4       15.0

But working with this puts me in other difficulties for plotting for instance. I'd like to have the headers being read as strings and not integers using df['2012']. I tried the following.

1.renaming the headers :

dict = {2010: '2010',
        2011: '2011',
        ...: '...'}
 
df.rename(columns=dict,
          inplace=True)

But of course it doesn't work, as the dict expects strings as a key

2.changing the header type:

I followed the following thread : Converting Pandas header to string type But applying this does raises another error which I don't really understand here.

TypeError: Cannot cast Index to dtype <U0

Any help or suggestion would be gladly appreciated ! And sorry if my explanations aren't cristal clear, my first time posting here and still a big newbie to coding.

I'm using Python 3.8.2 and Pandas 1.4.2

Solution

To change the header type, you can call:

df2.columns = df2.columns.astype(str)

For example,

If I set up a dataframe to match yours as best I can:

import numpy as np
import pandas as pd

df = pd.DataFrame({ 'ID': ['102520_1000010472', '102521_1000010472', '102521_1000010472', '102521_1000010472', '102521_1000010472'],
                    'Year': [np.nan, 2011, 2012, 2013, 2014],
                    'Electricity consumption (kWh)': [np.nan, 29.0, 29.0, 29.0, 109644556.0]}, dtype='object')

df['Electricity consumption (kWh)'] = df['Electricity consumption (kWh)'].astype('float')
print(df.dtypes)

Output:

ID                                object
Year                              object
Electricity consumption (kWh)    float64
dtype: object

Then do the pivot operation as you did:

df2 = df.pivot_table(index='ID', columns='Year', values='Electricity consumption (kWh)').round(0) 
df2 = df2.reset_index()
print(df2)

Output:

Year                 ID  2011  2012  2013         2014
0     102521_1000010472  29.0  29.0  29.0  109644556.0

If I check the types of the columns:

for i in df2.columns:
    print(type(i))

Output:

<class 'str'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>

Then do the conversion as suggested:

df2.columns = df2.columns.astype(str)
for i in df2.columns:
    print(type(i))

Output:

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>

Now you can call the dataframe as you wanted:

print(df2['2012'])

Output:

0    29.0
Name: 2012, dtype: float64