Search code examples
regexregex-groupregex-negation

Regex for capturing specific number patterns and number patterns only


I have a regex which captures all the floating point and integers correctly from a text. It also avoids alphanumeric texts.

Regex : /[+-]?\d*?[^a-zA-Z\n][^\s]/

But it fails in one of the test cases below .

Requirement :

1.) Capture all valid integers and decimals numbers (including one with positive + and negative - signs). 1,1.0, -1.0,-1,.6, 0.7, 0 ,+.6, +.01 are all valid. 7. is not valid . .6 is not captured in text below with above regex

2.) Avoid texts like 3E , 171A etc ........this regex does everything except this case. It captures text like 11A, 17E (but NOT 9E,8B), The text 10E in the extract below is getting captured in this regex, but not 9W .10E is also not needed here. Any string of format "NUMBERALPHABETs" has to be avoided

3.) Whitespaces should not get captured. Don't want to keep on trimming in the code [dataset can be huge, can use string.trim() in java, but want to avoid it]

Any suggestions ?

Sample text below

     la=    -0.8    -0.7    -1.3    -1.6    -0.2    -0.9    -0.6    -0.7    -0.4     0.0 
  9W t=  32.611  32.599  32.588  32.577  32.565  32.531  32.519  32.508  32.496  32.485
      a=    13.6    17.2    13.9    14.8    12.7    17.8    13.7    14.3    16.9    15.9 
      p=    16.2    17.9    17.7    16.5    14.8    20.3    16.7    17.1    21.1    17.8 
     la=     0.7     1.     0.7     0.8     0.6     0.9     1.0     2.0     1.8     0.9 
      t=  32.309  32.298  32.287  32.276  32.265  32.177  32.166  32.155  32.144  32.133
      a=    12.1    13.4    17.5    17.0     0.0    14.5    14.7    14.7    16.7    14.5 
      p=    15.2    14.6    18.4    18.5     0.0    15.1    15.9    17.1    17.5    17.0 
     la=     0.9     .6     1.3     0.5     0.0     0.3     0.9     0.9     0.9     0.6 

 10E t=  32.658  32.646  32.635  32.623  32.612  32.577  32.566  32.555  32.543  32.532
      a=    13.8    17.3    16.0    15.2    13.8    16.4    15.3    20.3    17.6    16.5 
      p=    15.2    18.0    17.4    17.1    15.6    17.7    18.0    23.2    19.1    18.8

Regex : /([^\s][\d])+(.\d+)?[^a-zA-Z][^\s]/ does everything but fails on 1, 0.9 etc .....does not capture the first digit and last digit.

Any help is appreciated.


Solution

  • You can use this: (?<!\S)[+-]?(?:\d+|\d*\.\d+)(?!\S)

    Explanation:

    • (?<!\S) check that matched pattern is not preceded by something else, than whitespace character. Equivalent to (<=\s|^),
    • [+-]? optional sign,
    • (?:\d+|\d*\.\d+) either integer, or floating number with optional integer part,
    • (?!\S) (equivalent to (?=\s|$)) matched number from previous point should be followed by whitespace symbol (space, tab or newline). Notice that this symbols is checked, but not included into actual match.

    Demo here