Search code examples
pythonpandasstringlocaleseries

Split string and apply locale to every row of Pandas Series


I want to make two transformations to the amount column of following df:

Address                                         type    amount
0   0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow 250,000 VSO
1   0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow 250,000 VSO
2   0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow 250,000 VSO
  1. I want to cut the ' VSO' substring from all rows.
  2. I want to apply locale.setlocale(locale.LC_ALL, 'en_us') to every row, turning every string into a float following that format.

The current code I have is:

locale.setlocale(locale.LC_ALL, 'en_us')
df_test['amount'].str.split(' VSO')[0]
locale.atof((str(df_test['amount'].values)))

Which yields me the error:

ValueError: could not convert string to float: "['250000 VSO' '250000 VSO' '250000 VSO' '33333 VSO' '33333 VSO'\n '10400000 VSO' '170833 VSO' '170833 VSO' '170833 VSO' '170833 VSO'\n

Solution

  • Try with apply after removing the trailing "VSO" with rstrip:

    import locale
    locale.setlocale(locale.LC_ALL, 'en_us')
    df["amount"] = df["amount"].str.rstrip(" VSO").apply(locale.atof)
    
    >>> df
                                          Address     type    amount
    0  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow  250000.0
    1  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow  250000.0
    2  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  outflow  250000.0