I have data containing a list of topics (topics 1-6; and 0 meaning no topic is assigned) and their value. I want to create a new column for each topic and each label, and fill the column with the value. Here's what the table looks like...
reviewId topic value label
01 2 -4 negative
02 2 9 positive
03 0 -7 negative
04 5 -1 negative
05 1 38 positive
What should I do to create a table looking like this?
reviewId | topic | value | label | t1p | t1n | t2p | t2n | t3p | t3n | t4p | t4n | t5p | t5n |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
01 | 2 | -4 | negative | 0 | 0 | 0 | -4 | 0 | 0 | 0 | 0 | 0 | 0 |
02 | 2 | 9 | positive | 0 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
03 | 0 | -7 | negative | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
04 | 5 | -1 | negative | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 |
05 | 1 | 38 | positive | 38 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Here's reproducible data to work on.
raw_df = pd.DataFrame({
'reviewId': ['01', '02', '03', '04', '05'],
'topic': [2, 2, 0, 5, 1],
'value': ['-4', '9', '-7', '-1', '38']})
Here's a link similar topic on this problem..
from itertools import product
(df.join(pd.get_dummies(zip(df["topic"], df["label"]))
.reindex(columns=product(range(1, 5+1), ["positive", "negative"]),
fill_value=0)
.mul(df["value"], axis="rows")
.rename(columns=lambda c: f"t{c[0]}{c[1][0]}")))
and join it with the original frame to get
reviewId topic value label t1p t1n t2p t2n t3p t3n t4p t4n t5p t5n
0 1 2 -4 negative 0 0 0 -4 0 0 0 0 0 0
1 2 2 9 positive 0 0 9 0 0 0 0 0 0 0
2 3 0 -7 negative 0 0 0 0 0 0 0 0 0 0
3 4 5 -1 negative 0 0 0 0 0 0 0 0 0 -1
4 5 1 38 positive 38 0 0 0 0 0 0 0 0 0