I read in an excel file into a pandas dataframe. The first row in the excel file is a row of headers. I read the row of headers into a numpy array. I then use the replace function to modify the text in the headers in the array. For some reason, this also changes the actual dataframe it seems like.
I actually wrote the initial code in Python 3.7, where it works just fine. But the exact same code does not work in Python 3.13.1 or Python 3.12.4.
Below is an example of what I do (I tested the script and it runs in all three python versions but it only produces the expected result in 3.7):
import pandas as pd
import numpy as np
df = pd.read_excel("path to excelfile")
# Based on header, I read in columns from the dataframe and do stuff with them.
header = df.keys()[1:] # Yes, I want to skip the first cell in the header.
# I use header_legend to make a nice legend in my plots.
header_legend = np.array(df.keys()[1:])
for i in range(0, len(header_legend)):
header_legend[i] = header_legend[i].replace("word1_word2", "word1 word2")
print(header)
print(header_legend)
Example CSV file (I guess just convert it to an excel file using excel, I use xlsx-format):
word_skip,word1_word2,word3_word4,word5_word6
1,10,100,1000
2,20,200,2000
3,30,300,3000
4,40,400,4000
5,50,500,5000
Now, in Python 3.7, header
remained unmodified and df.keys()[1:]
remained unmodified. Only header_legend
is modified by the replace function. But in Python 3.12 and 3.13, header
and df.keys()[1:]
are also modified, although I only apply the replace function to header_legend
.
Why does applying replace to header_legend
also change header
and df.keys()[1:]
in Python 3.12 and 3.13?
You seem to be relying on a view of the Index, this is generally a bad practice and prone to hidden issues like yours.
You can be explicit, modify your names and reassign them to the DataFrame:
header = list(df)
header[1:] = [c.replace('word1_word2', 'word1 word2') for c in header[1:]]
df.columns = header
Or use rename
with a dictionary (if the column names are unique):
df.rename(columns={c: c.replace('word1_word2', 'word1 word2')
for c in df.columns[1:]}, inplace=True)
Output:
word_skip word1 word2 word3_word4 word5_word6
0 1 10 100 1000
1 2 20 200 2000
2 3 30 300 3000
3 4 40 400 4000
4 5 50 500 5000