Search code examples
pythonregexpandasregex-groupregex-greedy

RegEx for extracting a decimal number


I have a pandas df where a column is a text with ratings in a format of X/10. I want to extract the numerators (which can be decimals). So far I was using:

my_df.text_column.str.extract('(\d*?\.?\d+(?=/10))')

I thought I was doing fine until I saw that I had some numerators like .10. What is actually happening is some rows have text like: "Nice job.10/10".

How can I specify that when extracting a number from this column, in case it detected a "." it must have came after a digit?

Thanks.


Solution

  • Do:

    df.text.str.extract(r'(\d+\.?\d*?(?=/10))')
    

    You want to first look for a number (\d+) followed by an optional (\.?) and an optional decimal (\d*?)

    Example:

    df = pd.DataFrame({'text':["Nice Job.10/10", "Score 9.5/10", "And now 5./10"]})
    df.text.str.extract(r'(\d+\.?\d*?(?=/10))')
    
    
    
        0
    0   10
    1   9.5
    2   5.