I read csv with more than 300 columns. Many columns have similar names.
Shortened example from the csv:
index d c a b b a a a
0 data data data data data data data data
Python automatically adds numbers at the end of columns names (if they are similar) as a suffix to ensure each column has a unique name.
Example:
index d c a b b.1 a.1 a.2 a.3
0 data data data data data data data data
My assignment is to sort the columns alphabetically and to add zeros (leading zeros) to the suffix.
Desired output:
index a a.01 a.02 a.03 b b.01 c d
0 data data data data data data data data
My code below (taken from stackoverflow) can sort the column. But I have no idea how to make suffix with leading zeros?
import pandas as pd
df= pd.read_csv(r"C:\Users\.....\file.csv", skipinitialspace=False, sep=';', header= 0, index_col = 'DateTime', low_memory=False)
df = df.sort_index(axis=1) # sort the column
pd.set_option('display.max_columns', None) # view all columns without truncated
Any ideas?
Edited question
After sorting the columns, I want,
a.01 a02 a12
rather than
a.01 a.012 a.02
With df.columns.str
There are answers posted, but would like to add one more easy solution:
Since df.columns
is an Index object, we can use the .str accessor.
Here is he Doc refence Doc
Just a short simulated example:
>>> df
b.1 a.1 a.2
0 data data data
1 data data data
Replace the desired columns directly to DataFrame as follows:
>>> df.columns = df.columns.str.replace('.','.0')
>>> df
b.01 a.01 a.02
0 data data data
1 data data data
OR below already given but better to use inplace=True
rather assigning to df.
df.rename(columns=lambda x: x.replace('.','.0'), inplace=True)
Another way around using axis:
df.rename(lambda x: x.replace('.','.0'), axis='columns', inplace=True)
print(df)
b.01 a.01 a.02
0 data data data
1 data data data