I have a large pandas DataFrame (160k records, 60 columns of mostly text) but for this example, I have the following DataFrame:
df1 = pd.DataFrame([{"GROUP": "A", "COL2": "1", "COL3": "P"},
{"GROUP": "A", "COL2": "2", "COL3": "Q"},
{"GROUP": "A", "COL2": "3", "COL3": "R"},
{"GROUP": "B", "COL2": "4", "COL3": "S"},
{"GROUP": "B", "COL2": "5", "COL3": "T"},
{"GROUP": "B", "COL2": "6", "COL3": "U"},
{"GROUP": "B", "COL2": "7", "COL3": "V"}])
I am trying to create another column that reverses the order of COL2 but only within groups of A and B...so I would expect from top to bottom, the values to be 3,2,1,7,6,5,4. This is achieved with this line of code:
df1['REVERSED_COL'] = df1.groupby("GROUP")["COL2"].transform(lambda x: x[::-1])
This line of code worked when I was using Python 3.11.7. However, I recently upgraded to 3.12.3 (and all other modules including pandas to 2.2.1) and removed 3.11.7 interpreter from my machine so I can't go back and test it again. I also have another machine using Python 3.7.4 where the same line of code still works as expected.
I tried using the x[::-1] on a list, and that reversed them as expected: This code:
x = [1,2,3,4,5,6,7,8,9]
print(x)
y = x[::-1]
print(y)
Results in:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
My question is, am I doing something wrong? Is there a different way I can do this? I checked the pandas and python documentation on Python changes and pandas 2.2.1 changes (respectively) but I couldn't find anything relevant enough.
Because the output of transform
is aligned back to the original index when it is a Series, which in your case reverts it back to its original order.
To avoid this you must convert to array/list (with values
/array
/to_numpy
):
df1['REVERSED_COL'] = (df1.groupby("GROUP")["COL2"]
.transform(lambda x: x[::-1].array)
)
Output:
GROUP COL2 COL3 REVERSED_COL
0 A 1 P 3
1 A 2 Q 2
2 A 3 R 1
3 B 4 S 7
4 B 5 T 6
5 B 6 U 5
6 B 7 V 4