Search code examples
pythonregex

How to extract only unique values from string using regex in Python?


I have this piece of String "Desirable: < 200 Borderline HIgh: 200 - 240 High: > 240" where I want to extract only unique Number or decimal values.

To extract Number,Decimal,- I was using this regex code r'[^0-9.-]+' but it doesn't return unique values:

import re

check = "Desirable: < 200 Borderline HIgh: 200 - 240 High: > 240"
re.sub(r'[^0-9.-]+', '',check)

output: 200200-240240

Desired output: 200-240

Please Note: Its important to able to extract Numbers, Decimals,- from the string.


Solution

  • You can extract all the numbers, decimal numbers using:

    re.findall(r'-?\d+\.?\d*', check)
    

    Then you can get the unique ones using set() and finally join them using "-".join

    Your desired code:

    "-".join(set(re.findall(r'-?\d+\.?\d*', check)))
    

    One of the challenges of this code is that the set() doesn't preserve the order of numbers. If order matters to you can use numpy.unique() instead:

    "-".join(np.unique(re.findall(r'-?\d+\.?\d*', check)))