As the title states, df.reset_index()
is currently dropping leading zeroes in several of my columns because the columns are converted to int64. The Excel-files consists of product and business-codes, showing which codes belong in different aggregates. Different codes may have a varying number of leading zeroes, and the codes themselves has a varying number if digits, so .zfill()
doesn't quite cut it. I have a quick-fix at hand, modifying the data, but I would like to learn how to circumnavigate such issues in the future.
This is how I currently read the files within a loop:
df = pd.read_excel("mappingOfAggregates.xlsx", index_col=[0,1,2,3], dtype=str)
I have also used df = df.astype('object')
after each file has been read, but it does not seem to affect the final result when I use df.reset_index()
.
Chat GPT suggests that I reset index and then use .astype('object')
on each column - which works as well as one would expect.
Any pointers in the right direction is received with thanks.
You can retain the indexes with index_col
argument but to keep leading zeros you need to specify the type of the column as string not a numeric value.
Assuming you have columns A
, B
, C
and D
as the index, you can achieve that by providing a dictionary with each column name corresponds to a str
, 'str'
or pd.StringDtype()
as follows:
pd.read_excel(
"mappingOfAggregates.xlsx",
index_col=[0, 1, 2, 3],
dtypes={
"A": pd.StringDtype(),
"B": pd.StringDtype(),
"C": pd.StringDtype(),
"D": pd.StringDtype(),
},
)