Search code examples
pandasdataframedata-cleaning

Making Sense of Stack/Unstack/Pivot_Table/etc


I have a Pandas DataFrame that looks like the following:

    ID           Date      Element  Temperature (C)
0   USW00094889 2014-11-12  TMAX    2.2
1   USC00208972 2009-04-29  TMIN    5.6
2   USC00200032 2008-05-26  TMAX    27.8
3   USC00205563 2005-11-11  TMAX    13.9
4   USC00200230 2014-02-27  TMAX    -10.6

I've been staring at the documentation for the stack and unstack functions and related functions like melt and pivot_table but I can't quite seem to figure out the combination of these that I need to do what I want. I'm trying to get rid of the "Element" column and instead create two new columns, "TMAX" and "TMIN". This would make each row a unique observation. Each Date/ID combo has a TMAX and TMIN. So theoretically this new dataframe would be half the height of the current one.


Solution

  • Docs: https://pandas.pydata.org/pandas-docs/stable/reshaping.html

    Create the DF and create the desired multi-index.

    """
    ID,Date,Element,Temperature (C)
    USW00094889,2014-11-12,TMAX,2.2
    USC00208972,2009-04-29,TMIN,5.6
    USC00200032,2008-05-26,TMAX,27.8
    USC00205563,2005-11-11,TMAX,13.9
    USC00200230,2014-02-27,TMAX,-10.6
    """
    import pandas as pd
    df = pd.read_clipboard(sep=",").set_index(["ID", "Date"])
    print(df)
    

    Output:

                           Element  Temperature (C)
    ID          Date                               
    USW00094889 2014-11-12    TMAX              2.2
    USC00208972 2009-04-29    TMIN              5.6
    USC00200032 2008-05-26    TMAX             27.8
    USC00205563 2005-11-11    TMAX             13.9
    USC00200230 2014-02-27    TMAX            -10.6
    
    df.pivot(index=df.index, columns="Element")
    

    Output:

                           Temperature (C)     
    Element                           TMAX TMIN
    ID          Date                           
    USC00200032 2008-05-26            27.8  NaN
    USC00200230 2014-02-27           -10.6  NaN
    USC00205563 2005-11-11            13.9  NaN
    USC00208972 2009-04-29             NaN  5.6
    USW00094889 2014-11-12             2.2  NaN
    

    Note that I dropped the numerical ID from your sample df.