python regex pandas regex-group regex-greedy

RegEx for extracting a decimal number

I have a pandas df where a column is a text with ratings in a format of X/10. I want to extract the numerators (which can be decimals). So far I was using:

my_df.text_column.str.extract('(\d*?\.?\d+(?=/10))')

I thought I was doing fine until I saw that I had some numerators like .10. What is actually happening is some rows have text like: "Nice job.10/10".

How can I specify that when extracting a number from this column, in case it detected a "." it must have came after a digit?

Thanks.

Solution

Do:

df.text.str.extract(r'(\d+\.?\d*?(?=/10))')

You want to first look for a number (\d+) followed by an optional (\.?) and an optional decimal (\d*?)

Example:

df = pd.DataFrame({'text':["Nice Job.10/10", "Score 9.5/10", "And now 5./10"]})
df.text.str.extract(r'(\d+\.?\d*?(?=/10))')



    0
0   10
1   9.5
2   5.