Search code examples
pythonpandasreplacerenamemethod-chaining

Alternative to df.rename(columns=str.replace(" ", "_"))


I noticed that it's possible to use df.rename(columns=str.lower), but not df.rename(columns=str.replace(" ", "_")).

  1. Is this because it is allowed to use the variable which stores the method (str.lower), but it's not allowed to actually call the method (str.lower())? There is a similar question, why the error message of df.rename(columns=str.replace(" ", "_")) is rather confusing – without an answer on that.

  2. Is it possible to use methods of the .str accessor (of pd.DataFrame().columns) inside of df.rename(columns=...)? The only solution I came up so far is

    df = df.rename(columns=dict(zip(df.columns, df.columns.str.replace(" ", "_"))))
    

    but maybe there is something more consistent and similar to style of df.rename(columns=str.lower)? I know df.rename(columns=lambda x: x.replace(" ", "_") works, but it doesn't use the .str accessor of pandas columns, it uses the str.replace() of the standard library.
    The purpose of the question is explore the possibilities to use pandas str methods when renaming columns in method chaining, that's why df.columns = df.columns.str.replace(' ', '_') is not suitable to me.

As an df example, assume:

df = pd.DataFrame([[0,1,2]], columns=["a pie", "an egg", "a nut"])

Solution

  • df.rename accepts a function object (or other callable).

    In the first case, str.lower is a function. However, str.replace(" ", "_") calls the function and evaluate to the result, although, in this case, the call is not correct so it raises an error. But you don't want to pass the result of calling the function, you want to pass the function.

    So something like

    def space_to_dash(col): 
        return col.replace(" ", "_")
    
    df.rename(columns=space_to_dash)
    

    Or, use a lambda expression:

    df.rename(columns=lambda col: col.replace(" ", "_"))
    

    Note, df.rename(columns=str.lower) doesn't use the .str accessor either, it uses the built-in str method. So I think you are confused.

    Now, you can use the .str accessor on the column index object, so:

    df.columns.str.replace(" ", "_")
    

    But then you would need to do what you already said you didn't want to do:

    df.columns = df.columns.str.replace(" ", "_")
    

    It is important to point out, this mutates the original dataframe object in place as opposed to df.rename, which returns a new dataframe object. It isn't clear why you want to use the .str accessor, is that the reason?