Search code examples
pythonpandasdataframereshape

How can I melt a dataframe from wide to long twice on the same column?


I have a dataframe:

data = [
    [1, "2022-04-29", 123, "circle", 1, 3, 6, 7.3],
    [1, "2022-02-10", 456, "square", 4, np.nan, 3, 9],
]

df = pd.DataFrame(
    data,
    columns=[
        "ID",
        "date",
        "code",
        "shape",
        "circle_X_rating",
        "circle_Y_rating",
        "square_X_rating",
        "square_Y_rating",
    ],
)
df


ID date      code shape circle_X_rating circle_Y_rating square_X_rating square_Y_rating
1 2022-04-29  123 circle       1               3.0             6              7.3
1 2022-02-10  456 square       4               NaN             3              9.0

I would like to melt this dataframe such that there is a shape column and 2 columns for rating X_rating and Y_rating, and I am not sure how to do this. Currently I am melting it and this is what I get:

test = (
    pd.melt(
        df,
        id_vars=[
            "ID",
            "date",
            "bar_code",
            "shape",
        ],
        value_vars=[
            "circle_X_rating",
            "circle_Y_rating",
            "square_X_rating",
            "square_Y_rating",
        ],
        var_name="shape_for_rating",
        value_name="shape_rating",
    )
    .assign(
        shape_for_rating=lambda df: df["shape"].apply(lambda a_str: a_str.split("_")[0])
    )
    .query("shape == shape")
    .drop(columns=["shape_for_rating"])
)
test

    ID  date        code    shape   shape_rating
0   1   2022-04-29  123     circle      1.0
1   1   2022-02-10  456     square      4.0
2   1   2022-04-29  123     circle      3.0
3   1   2022-02-10  456     square      NaN
4   1   2022-04-29  123     circle      6.0
5   1   2022-02-10  456     square      3.0
6   1   2022-04-29  123     circle      7.3
7   1   2022-02-10  456     square      9.0

But what I'd really like is:

    ID  date        code    shape   X_rating   Y_rating
0   1   2022-04-29  123     circle      1.0       3
1   1   2022-04-29  123     square      6.0      7.3
2   1   2022-02-10  456     circle      4        NaN
3   1   2022-02-10  456     square      3         9
...

Does anyone know the best way to do this? I've been spinning my wheels at this.


Solution

  • Try wide_to_long

    df.columns = df.columns.str.split('_',n=1).map(lambda x : '_'.join(x[::-1]))
    
    df = pd.wide_to_long(df, 
                        stubnames = ['X_rating','Y_rating'], 
                        i = ['ID', 'date', 'code', 'shape'], 
                        j = 'shape1',
                        sep = '_',
                        suffix = r'\w+').reset_index()
    df
    Out[84]: 
       ID        date  code   shape   shape1  X_rating  Y_rating
    0   1  2022-04-29   123  circle   circle         1       3.0
    1   1  2022-04-29   123  circle   square         6       7.3
    2   1  2022-02-10   456  square   circle         4       NaN
    3   1  2022-02-10   456  square   square         3       9.0