Search code examples
pythonstringpandasseries

How to convert this Series of string to a Series of numbers


I have a pandas series as followed:

series = pd.Series(['', '', '$ 200,000', '$ -320,000','$ 700,000','','$ -10,000']) 

I need to convert all string values in the series to integers or float by removing '$ ' and commas. I have tried the following:

for i in series:
    if '$ ' in i:
        i=i.replace('$ ',"")
series = pd.to_numeric(series, errors='coerce')

However, the replace method doesn't change the series. I'm not yet sure if pd.to_numeric method is gonna help me concert the series to numbers or not.


Solution

  • replace and to_numeric

    • Remove anything that isn't a digit, period/decimal point, or minus sign.
    • Using pandas.Series.str.replace defaults to using regex

    pd.to_numeric(series.str.replace('[^0-9\.-]', ''))
    
    0         NaN
    1         NaN
    2    200000.0
    3   -320000.0
    4    700000.0
    5         NaN
    6    -10000.0
    dtype: float64
    

    Equivalently: pandas.Series.replace has to be told that using regex is needed.

    pd.to_numeric(series.replace('[^0-9\.-]', '', regex=True))