Search code examples
pythonpandaspivotunpivot

How can I unpivot a dataframe with variable date columns?


I have some input dataframe as the following example:

import pandas as pd

df = pd.DataFrame(
    {
        "Name":["Ale", "Dan", "Hel"],
        "Project": ["A", "A", "B"],
        "10/04/2023 Formation": [24, 40, 40],
        "17/04/2023 Formation": [12, 24, 24],
        "10/04/2023 Holidays": [40, 40, 40],
        "17/04/2023 Holidays": [12, 40, 24],
     }
)

As can be seen, some of the columns are dates belonging to a specific concept (Formation, Holidays, in this case).

What I want is to normalize these columns to convert them into rows (unpivot), but preserving (grouping) each project in its own column; that is, I would like to perform a dataframe transformation whose output would be something like:

         Date Name Project  Formation  Holidays
0  10/04/2023  Ale       A         24        40
1  17/04/2023  Ale       A         12        12
2  10/04/2023  Dan       A         40        40
3  17/04/2023  Dan       A         24        40
4  10/04/2023  Hel       B         40        40
5  17/04/2023  Hel       B         24        40

Solution

  • import pandas as pd
    
    df = pd.DataFrame(
        {
            "Name":["Ale", "Dan", "Hel"],
            "Project": ["A", "A", "B"],
            "10/04/2023 Formation": [24, 40, 40],
            "17/04/2023 Formation": [12, 24, 24],
            "10/04/2023 Holidays": [40, 40, 40],
            "17/04/2023 Holidays": [12, 40, 24],
         }
    )
    
    # Unpivot the data using pd.melt()
    df_melted = pd.melt(df, id_vars=["Name", "Project"], var_name="Date_Concept", value_name="Value")
    
    # Separate the Date and Concept columns
    df_melted[["Date", "Concept"]] = df_melted["Date_Concept"].str.split(" ", expand=True)
    df_melted = df_melted.drop("Date_Concept", axis=1)
    
    # Use pd.pivot_table() to reshape the dataframe
    df_final = pd.pivot_table(df_melted, index=["Date", "Name", "Project"], columns="Concept", values="Value").reset_index()
    
    # Reorder the columns
    df_final = df_final[["Date", "Name", "Project", "Formation", "Holidays"]].sort_values(["Name", "Date", "Project"])
    
    print(df_final)