Search code examples
pythonpandasdata-cleaning

I am trying to combine similar columns into rows


I have a dataframe that is pretty messy. The df looks like this :

student_id  date         measure      level            measure.1       level.1....
804322      9/2/2022     some string   more string     some string     more string 
3424234     9/3/2023     some string   more string     some string     more string
2132121     4/5/2022     some string   more string     some string     more string

This goes on to measure and level 2 and 3. What I am trying to do is to take measure.1, measure.2, and measure.3 into one column and repeat the values of date and student id for each row that is created. Plus do the same for level, level.1, etc. Any help would be greatly appreciated!


Solution

  • IIUC, you can use :

    out = (
            df
             .set_index(["student_id", "date"])
             .set_axis(df.columns[2:].str.split(".", expand=True), axis=1)
             .stack(1)
             .droplevel(-1)
             .reset_index()
          )
    

    Output :

    print(out)
    
       student_id      date        level      measure
    0      804322  9/2/2022  more string  some string
    1      804322  9/2/2022  more string  some string
    2     3424234  9/3/2023  more string  some string
    3     3424234  9/3/2023  more string  some string
    4     2132121  4/5/2022  more string  some string
    5     2132121  4/5/2022  more string  some string