I have a dataframe:
data = [
[1, "2022-04-29", 123, "circle", 1, 3, 6, 7.3],
[1, "2022-02-10", 456, "square", 4, np.nan, 3, 9],
]
df = pd.DataFrame(
data,
columns=[
"ID",
"date",
"code",
"shape",
"circle_X_rating",
"circle_Y_rating",
"square_X_rating",
"square_Y_rating",
],
)
df
ID date code shape circle_X_rating circle_Y_rating square_X_rating square_Y_rating
1 2022-04-29 123 circle 1 3.0 6 7.3
1 2022-02-10 456 square 4 NaN 3 9.0
I would like to melt this dataframe such that there is a shape column and 2 columns for rating X_rating
and Y_rating
, and I am not sure how to do this. Currently I am melting it and this is what I get:
test = (
pd.melt(
df,
id_vars=[
"ID",
"date",
"bar_code",
"shape",
],
value_vars=[
"circle_X_rating",
"circle_Y_rating",
"square_X_rating",
"square_Y_rating",
],
var_name="shape_for_rating",
value_name="shape_rating",
)
.assign(
shape_for_rating=lambda df: df["shape"].apply(lambda a_str: a_str.split("_")[0])
)
.query("shape == shape")
.drop(columns=["shape_for_rating"])
)
test
ID date code shape shape_rating
0 1 2022-04-29 123 circle 1.0
1 1 2022-02-10 456 square 4.0
2 1 2022-04-29 123 circle 3.0
3 1 2022-02-10 456 square NaN
4 1 2022-04-29 123 circle 6.0
5 1 2022-02-10 456 square 3.0
6 1 2022-04-29 123 circle 7.3
7 1 2022-02-10 456 square 9.0
But what I'd really like is:
ID date code shape X_rating Y_rating
0 1 2022-04-29 123 circle 1.0 3
1 1 2022-04-29 123 square 6.0 7.3
2 1 2022-02-10 456 circle 4 NaN
3 1 2022-02-10 456 square 3 9
...
Does anyone know the best way to do this? I've been spinning my wheels at this.
Try wide_to_long
df.columns = df.columns.str.split('_',n=1).map(lambda x : '_'.join(x[::-1]))
df = pd.wide_to_long(df,
stubnames = ['X_rating','Y_rating'],
i = ['ID', 'date', 'code', 'shape'],
j = 'shape1',
sep = '_',
suffix = r'\w+').reset_index()
df
Out[84]:
ID date code shape shape1 X_rating Y_rating
0 1 2022-04-29 123 circle circle 1 3.0
1 1 2022-04-29 123 circle square 6 7.3
2 1 2022-02-10 456 square circle 4 NaN
3 1 2022-02-10 456 square square 3 9.0