Thanks for reading this and trying to help me. I want to analyze what a student does after finishing the last question in a test. I am analyzing this in a pandas dataframe.
Currently I have among others the first two columns of the table below, however, I want to create a new column and assign values according to the first time the max value (question sequence #) of an individual student is hit. I want to create the third column but I do not know how.
Could someone help me with this?
I hope I gave all the needed details. Otherwise feel free to contact me. I know I am able to find the index of the max value with .idxmax(), but not per student and how to assign values before and after this.
Student ID | Question sequence # | before or after last item? |
---|---|---|
1 | 1 | Before or last item |
1 | 2 | Before or last item |
1 | 3 | Before or last item |
1 | 1 | after last item |
1 | 2 | after last item |
2 | 1 | Before or last item |
2 | 2 | Before or last item |
2 | 3 | Before or last item |
2 | 4 | Before or last item |
2 | 1 | after last item |
2 | 2 | after last item |
We can use groupby
with idxmax
here. Then compare the index to the index with the highest value:
m = df.index > df.groupby('Student ID')['Question sequence #'].transform("idxmax")
df['before or after last item?'] = np.where(
m,
"after last item",
"Before or last item"
)
Student ID Question sequence # before or after last item?
0 1 1 Before or last item
1 1 2 Before or last item
2 1 3 Before or last item
3 1 1 after last item
4 1 2 after last item
5 2 1 Before or last item
6 2 2 Before or last item
7 2 3 Before or last item
8 2 4 Before or last item
9 2 1 after last item
10 2 2 after last item