I have a dataframe that is pretty messy. The df looks like this :
student_id date measure level measure.1 level.1....
804322 9/2/2022 some string more string some string more string
3424234 9/3/2023 some string more string some string more string
2132121 4/5/2022 some string more string some string more string
This goes on to measure and level 2 and 3. What I am trying to do is to take measure.1, measure.2, and measure.3 into one column and repeat the values of date and student id for each row that is created. Plus do the same for level, level.1, etc. Any help would be greatly appreciated!
IIUC, you can use :
out = (
df
.set_index(["student_id", "date"])
.set_axis(df.columns[2:].str.split(".", expand=True), axis=1)
.stack(1)
.droplevel(-1)
.reset_index()
)
Output :
print(out)
student_id date level measure
0 804322 9/2/2022 more string some string
1 804322 9/2/2022 more string some string
2 3424234 9/3/2023 more string some string
3 3424234 9/3/2023 more string some string
4 2132121 4/5/2022 more string some string
5 2132121 4/5/2022 more string some string