I have a pandas df where a column is a text with ratings in a format of X/10. I want to extract the numerators (which can be decimals). So far I was using:
my_df.text_column.str.extract('(\d*?\.?\d+(?=/10))')
I thought I was doing fine until I saw that I had some numerators like .10
. What is actually happening is some rows have text like: "Nice job.10/10".
How can I specify that when extracting a number from this column, in case it detected a "." it must have came after a digit?
Thanks.
Do:
df.text.str.extract(r'(\d+\.?\d*?(?=/10))')
You want to first look for a number (\d+
) followed by an optional (\.?
) and an optional decimal (\d*?
)
Example:
df = pd.DataFrame({'text':["Nice Job.10/10", "Score 9.5/10", "And now 5./10"]})
df.text.str.extract(r'(\d+\.?\d*?(?=/10))')
0
0 10
1 9.5
2 5.