Search code examples
pandasdataframegroup-by

Pandas: reshaping data pandas


I have the following dataset:

df =
id test score date
1  A    1     2000-01-01
1  B    3     2000-01-01
1  A    1     2000-01-30
1  B    5     2000-01-30
2  A    2     2000-01-01
2  B    4     2000-01-01
3  A    6     2000-01-01
4  A    2     2000-01-01

I want to group_by id and date in order to get columns containing the type of the test for the same id and date:

df_outcome =
id testA testB date
1  1     3     2000-01-01
1  1     5     2000-01-30
2  2     4     2000-01-01

note that individuals 3 and 4 do not have test A and B in the same date and therefore are excluded from the analysis.


Solution

  • (df.pivot(index=["id", "date"], columns="test", values="score")
       .dropna()
       .rename_axis(columns=None)
       .add_prefix("test")
       .reset_index()
       .convert_dtypes())
    
    • pivot over id & date with test variables in the column, scores as the content
    • NaN will arise if a id-date pair doesn't have A or B score: so drop them
    • move the "test" from the name of the columns to be the prefixes of the column names, i.e., testA, testB
    • move id & date back to the columns side
    • due to possible NaNs, things got floated, so convert datatypes for depromotion if possible, i.e., float to int here

    to get

       id        date  testA  testB
    0   1  2000-01-01      1      3
    1   1  2000-01-30      1      5
    2   2  2000-01-01      2      4