I'm trying to reindex the columns in a set of dataframes inside a loop. This only seems to work outside the loop. See sample code below
import pandas as pd
data1 = [[1,2,3],[4,5,6],[7,8,9]]
data2 = [[10,11,12],[13,14,15],[16,17,18]]
data3 = [[19,20,21],[22,23,24],[25,26,27]]
index = ['a','b','c']
columns = ['d','e','f']
df1 = pd.DataFrame(data=data1,index=index,columns=columns)
df2 = pd.DataFrame(data=data2,index=index,columns=columns)
df3 = pd.DataFrame(data=data3,index=index,columns=columns)
columns2 = ['f','e','d']
for i in [df1,df2,df3]:
i = i.reindex(columns=columns2)
print(df1)
df2 = df2.reindex(columns=columns2)
print(df2)
df1 is not reindexed as desired, however if I reindex df2 outside of the loop it works. Why is that?
Thanks Andrew
That happens for the same reason this happens:
a = 5
b = 6
for i in [a, b]:
i = 4
>>> a
5
Why? See this accepted answer.
Concerning your problem, one way to go about it is create a list
of reindexed dataframes like so:
reindexed_dfs = [df.reindex(columns=columns2) for df in [df1, df2, df3]]
and then reassign df1
, df2
and df3
. But it's better to just keep using your newly created list anyways.