Search code examples
pythonpandasdynamic-columnscolumnsorting

Automatic leading zero for same column names in python


I read csv with more than 300 columns. Many columns have similar names.
Shortened example from the csv:

index   d     c     a     b     b     a     a     a
0       data  data  data  data  data  data  data  data

Python automatically adds numbers at the end of columns names (if they are similar) as a suffix to ensure each column has a unique name.
Example:

index   d     c     a     b     b.1   a.1   a.2   a.3
0       data  data  data  data  data  data  data  data

My assignment is to sort the columns alphabetically and to add zeros (leading zeros) to the suffix.
Desired output:

index   a     a.01  a.02  a.03  b     b.01  c     d
0       data  data  data  data  data  data  data  data

My code below (taken from stackoverflow) can sort the column. But I have no idea how to make suffix with leading zeros?

import pandas as pd
df= pd.read_csv(r"C:\Users\.....\file.csv", skipinitialspace=False, sep=';', header= 0, index_col = 'DateTime', low_memory=False)

df = df.sort_index(axis=1) # sort the column
pd.set_option('display.max_columns', None) # view all columns without truncated

Any ideas?

Edited question
After sorting the columns, I want,

a.01 a02 a12

rather than

a.01 a.012 a.02

Solution

  • With df.columns.str

    There are answers posted, but would like to add one more easy solution:

    Since df.columns is an Index object, we can use the .str accessor.

    Here is he Doc refence Doc

    Just a short simulated example:

    >>> df
        b.1   a.1   a.2
    0  data  data  data
    1  data  data  data
    

    Replace the desired columns directly to DataFrame as follows:

    >>> df.columns = df.columns.str.replace('.','.0')
    >>> df
       b.01  a.01  a.02
    0  data  data  data
    1  data  data  data
    

    OR below already given but better to use inplace=True rather assigning to df.

    df.rename(columns=lambda x: x.replace('.','.0'), inplace=True)
    

    Another way around using axis:

    df.rename(lambda x: x.replace('.','.0'), axis='columns', inplace=True)
    print(df)
       b.01  a.01  a.02
    0  data  data  data
    1  data  data  data