Search code examples
pythonpandasiterationrenaming

How to remove numbers from all column names / headers in a dataframe


Hi So I have a data frame with column names that end in '2018'

I need to remove the years from these column names and am having some trouble. I also need to strip leading and trailing spaces from these column names as well.

I've already tried the following:

df.columns.str.replace('\d+',"") #to try and remove the numbers from the column names

df.columns = df.columns.str.strip('') #to try and get rid of the spaces

These do nothing to the dataframe.

I expect the column names to go from " Stock 2018" to "Stock"

but this isn't happening. Thank you for the help!


Solution

  • You can try using regex as well..

    Example DataFrame:

    >>> df = pd.DataFrame.from_dict({'Name04': ['Chris', 'Joe', 'Karn', 'Alina'], 'Age04': [14, 16, 18, 21], 'Weight04': [15, 21, 37, 45]})                                 
    
    >>> df
       Age04 Name04  Weight04
    0     14  Chris        15
    1     16    Joe        21
    2     18   Karn        37
    3     21  Alina        45
    

    Result using regex:

    >>> df.columns = df.columns.str.replace(r'\d+', '')
    >>> df
       Age   Name  Weight
    0   14  Chris      15
    1   16    Joe      21
    2   18   Karn      37
    3   21  Alina      45