I have a DataFrame:
name | age | |
---|---|---|
0 | Paul | 25 |
1 | John | 27 |
2 | Bill | 23 |
I know that if I enter:
df[['name']] = df[['age']]
I'll get the following:
name | age | |
---|---|---|
0 | 25 | 25 |
1 | 27 | 27 |
2 | 23 | 23 |
But I expect the same result from the command:
df.loc[:, ['name']] = df.loc[:, ['age']]
But instead, I get this:
name | age | |
---|---|---|
0 | NaN | 25 |
1 | NaN | 27 |
2 | NaN | 23 |
For some reason, if I omit those square brackets []
around column names, I'll get exactly what I expected. That is the command:
df.loc[:, 'name'] = df.loc[:, 'age']
gives the right result:
name | age | |
---|---|---|
0 | 25 | 25 |
1 | 27 | 27 |
2 | 23 | 23 |
Why does two pairs of brackets with .loc
result in NaN? Is it some sort of a bug or is it intended behaviour? I can't figure out the reason for such a behaviour.
That's because for the loc
assignment all index axes are aligned, including the columns: Since age
and name
do not match, there is no data to assign, hence the NaNs.
You can make it work by renaming the columns:
df.loc[:, ["name"]] = df.loc[:, ["age"]].rename(columns={"age": "name"})
or by accessing the numpy array:
df.loc[:, ["name"]] = df.loc[:, ["age"]].values