Search code examples
pythonstringpandasmissing-data

How to lowercase a pandas dataframe string column if it has missing values?


The following code does not work.

import pandas as pd
import numpy as np
df=pd.DataFrame(['ONE','Two', np.nan],columns=['x']) 
xLower = df["x"].map(lambda x: x.lower())

How should I tweak it to get xLower = ['one','two',np.nan] ? Efficiency is important since the real data frame is huge.


Solution

  • use pandas vectorized string methods; as in the documentation:

    these methods exclude missing/NA values automatically

    .str.lower() is the very first example there;

    >>> df['x'].str.lower()
    0    one
    1    two
    2    NaN
    Name: x, dtype: object