I have a dataframe with a column that has various unit types
The rawNo column is how the data comes in. I would like to change it to look like the ConvNo column
datasample = pd.DataFrame(columns=['rawNo','ConvNo'])
datasample = datasample.append({'rawNo': '-4.35%','ConvNo': -.0435},ignore_index = True)
datasample = datasample.append({'rawNo': '246.6K','ConvNo': 246600},ignore_index = True)
datasample = datasample.append({'rawNo': np.nan,'ConvNo': np.nan},ignore_index = True)
datasample = datasample.append({'rawNo': '$12.76B','ConvNo': 12760000000},ignore_index = True)
datasample = datasample.append({'rawNo': '4.68%','ConvNo': .0468},ignore_index = True)
datasample = datasample.append({'rawNo': '¥-459.5B','ConvNo': -459500000000},ignore_index = True)
datasample = datasample.append({'rawNo': '€-6.8B','ConvNo': -6800000000},ignore_index = True)
datasample = datasample.append({'rawNo': '£-15.623B','ConvNo': -15623000000},ignore_index = True)
datasample = datasample.append({'rawNo': '$-1,400B','ConvNo': -15623000000},ignore_index = True)
I figure I will have to use some type of conditional apply. This apply to remove the percent is failing
def rPercent(value):
value = str(value)
count = value.count('%')
print(count)
if (count != 0):
return value.str.rstrip('% ').astype('float') / 100.0
else:
return value
datasample["ConvNo"] = datasample['actual'].apply(rPercent)
Error I get:
> AttributeError: 'str' object has no attribute 'str'
Data File. you can download file from this link https://projectcodesamples.s3.amazonaws.com/ForEx.csv
The columns I am trying to convert is "actual" the result is in the "CNactual" column
This is posted as a separate answer, since I want to retain the original reproducible example (in case the linked csv is no longer available).
A careful analysis of the data will show that the non-numeric characters that cause trouble are: commas used as thousand separators, single dash symbols (presumably indicating nan
). After incorporating these into the character_mapping
the conversion works without raising any errors:
from numpy import nan
from pandas import read_csv
df = read_csv("https://projectcodesamples.s3.amazonaws.com/ForEx.csv")
character_mapping = {
"%": "e-2",
"K": "e3",
"M": "e6",
"B": "e9",
"T": "e12",
# note that the currency symbols are mapped to blank string
# to facilitate the conversion to float
"£": "",
"\$": "",
"¥": "",
"€": "",
# if an entry consists of a dash, replace with nan
"^-$": nan,
# some numbers have a comma as a thousands separator
",": "",
}
df["Converted"] = df["consensus"].replace(character_mapping, regex=True).astype(float)
mask = (df["Converted"] != df["consensus"]) & (df["consensus"].notna())
print(df.loc[mask, ["consensus", "Converted"]])