Search code examples
pandastypeskeyerror

Dataframe colum header as Object returning "KeyError" using Pandas


I'm aggregating some data before analysis. I've got a dataframe which looks like that :

    ID                  Year    Electricity consumption (kWh)
0   102520_1000010472   NaN     NaN
1   102521_1000010472   2011    29.0
2   102521_1000010472   2012    29.0
3   102521_1000010472   2013    29.0
4   102521_1000010472   2014    109644556.0
...     ...                     ...     ...

The types are as follows :

ID                                object
Year                              object
Electricity consumption (kWh)    float64
dtype: object

I then pivot the table to get the data in line. Up to there, no problem.

df2 = df.pivot_table(index='ID', columns='Year', values='Electricity consumption (kWh)').round(0) df2 = df2.reset_index()

Year    ID                  2011    2012    2013    ...
0       102521_1000010472   29.0    29.0    29.0    ...
...

Then I need to call a specific colum. For exemple :

df2['2012']

Python gives me here a KeyError.

I identify this as being a problem with my datatype. I if call the following : df2[2012] It does return the expected Series :

0       29.0
1        NaN
2       15.0
3        NaN
4       15.0

But working with this puts me in other difficulties for plotting for instance. I'd like to have the headers being read as strings and not integers using df['2012']. I tried the following.

1.renaming the headers :

dict = {2010: '2010',
        2011: '2011',
        ...: '...'}
 
df.rename(columns=dict,
          inplace=True)

But of course it doesn't work, as the dict expects strings as a key

2.changing the header type:

I followed the following thread : Converting Pandas header to string type But applying this does raises another error which I don't really understand here.

TypeError: Cannot cast Index to dtype <U0

Any help or suggestion would be gladly appreciated ! And sorry if my explanations aren't cristal clear, my first time posting here and still a big newbie to coding.

I'm using Python 3.8.2 and Pandas 1.4.2


Solution

  • To change the header type, you can call:

    df2.columns = df2.columns.astype(str)
    



    For example,

    If I set up a dataframe to match yours as best I can:

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({ 'ID': ['102520_1000010472', '102521_1000010472', '102521_1000010472', '102521_1000010472', '102521_1000010472'],
                        'Year': [np.nan, 2011, 2012, 2013, 2014],
                        'Electricity consumption (kWh)': [np.nan, 29.0, 29.0, 29.0, 109644556.0]}, dtype='object')
    
    df['Electricity consumption (kWh)'] = df['Electricity consumption (kWh)'].astype('float')
    print(df.dtypes)
    

    Output:

    ID                                object
    Year                              object
    Electricity consumption (kWh)    float64
    dtype: object
    


    Then do the pivot operation as you did:

    df2 = df.pivot_table(index='ID', columns='Year', values='Electricity consumption (kWh)').round(0) 
    df2 = df2.reset_index()
    print(df2)
    

    Output:

    Year                 ID  2011  2012  2013         2014
    0     102521_1000010472  29.0  29.0  29.0  109644556.0
    


    If I check the types of the columns:

    for i in df2.columns:
        print(type(i))
    

    Output:

    <class 'str'>
    <class 'int'>
    <class 'int'>
    <class 'int'>
    <class 'int'>
    


    Then do the conversion as suggested:

    df2.columns = df2.columns.astype(str)
    for i in df2.columns:
        print(type(i))
    

    Output:

    <class 'str'>
    <class 'str'>
    <class 'str'>
    <class 'str'>
    <class 'str'>
    


    Now you can call the dataframe as you wanted:

    print(df2['2012'])
    

    Output:

    0    29.0
    Name: 2012, dtype: float64