I would like the string number as integer for the total cost that appear between total and USD.
Example DataFrame:
id name lastname message
0 1 John Doe John have 100 USD, so he buy 5 eggs which total cost 10 USD
1 2 Mar Aye Mar have 10 USD, he just buy a banana from another shop for 16 USD
So the final result should be:
id name lastname message total
0 1 John Doe John have 100 USD, so he buy 5 eggs which total cost 10 USD 10
1 2 Mar Aye Mar have 10 USD, he just buy a banana from another shop for 16 USD 0
You can use a regex to capture any number that appears between "total" and "USD".
The below code will capture any number (first one if multiple, need some adjustments if floats should be accepted but since the type should be an int there should be no need) and convert it to int type.
df['total'] = df['message'].str.extract('total.*?(\d+).*?USD').fillna(0).astype(int)
Result:
id name lastname message total
0 1 John Doe John have 100 USD, so he buy 5 eggs which total cost 10 USD 10
1 2 Mar Aye Mar have 10 USD, he just buy a banana from another shop for 16 USD 0