Search code examples
pythonpandasdataframeformatcurrency

How to sum price of human readable currency format?


I retrieve data from a website, the price column is 809.989K, 1.792M and I want to combine the numbers from the price column into a new column.

My dataframe

import pandas as pd

data = {
  "price": ["809.989K", "1.792M"]
}

df = pd.DataFrame(data)
df

Desired output

enter image description here


Solution

  • Convert values to floats numbers with sum, last add prefix by si_format library:

    #pip install si_format
    from si_prefix import si_format
    
    #https://stackoverflow.com/a/39684629/2901002
    out =  ((df.price.replace(r'[KM]+$', '', regex=True).astype(float) * 
                  df.price.str.extract(r'[\d\.]+([KM]+)', expand=False)
                    .fillna(1)
                    .replace(['K','M'], [10**3, 10**6]).astype(int)).sum())
    
    df['sum'] = si_format(out, precision=1).upper()
    print (df)
          price    sum
    0  809.989K  2.6 M
    1    1.792M  2.6 M