I have a regex which captures all the floating point and integers correctly from a text. It also avoids alphanumeric texts.
Regex : /[+-]?\d*?[^a-zA-Z\n][^\s]/
But it fails in one of the test cases below .
Requirement :
1.) Capture all valid integers and decimals numbers (including one with positive + and negative - signs). 1,1.0, -1.0,-1,.6, 0.7, 0 ,+.6, +.01 are all valid. 7. is not valid . .6 is not captured in text below with above regex
2.) Avoid texts like 3E , 171A etc ........this regex does everything except this case. It captures text like 11A, 17E (but NOT 9E,8B), The text 10E in the extract below is getting captured in this regex, but not 9W .10E is also not needed here. Any string of format "NUMBERALPHABETs" has to be avoided
3.) Whitespaces should not get captured. Don't want to keep on trimming in the code [dataset can be huge, can use string.trim() in java, but want to avoid it]
Any suggestions ?
Sample text below
la= -0.8 -0.7 -1.3 -1.6 -0.2 -0.9 -0.6 -0.7 -0.4 0.0
9W t= 32.611 32.599 32.588 32.577 32.565 32.531 32.519 32.508 32.496 32.485
a= 13.6 17.2 13.9 14.8 12.7 17.8 13.7 14.3 16.9 15.9
p= 16.2 17.9 17.7 16.5 14.8 20.3 16.7 17.1 21.1 17.8
la= 0.7 1. 0.7 0.8 0.6 0.9 1.0 2.0 1.8 0.9
t= 32.309 32.298 32.287 32.276 32.265 32.177 32.166 32.155 32.144 32.133
a= 12.1 13.4 17.5 17.0 0.0 14.5 14.7 14.7 16.7 14.5
p= 15.2 14.6 18.4 18.5 0.0 15.1 15.9 17.1 17.5 17.0
la= 0.9 .6 1.3 0.5 0.0 0.3 0.9 0.9 0.9 0.6
10E t= 32.658 32.646 32.635 32.623 32.612 32.577 32.566 32.555 32.543 32.532
a= 13.8 17.3 16.0 15.2 13.8 16.4 15.3 20.3 17.6 16.5
p= 15.2 18.0 17.4 17.1 15.6 17.7 18.0 23.2 19.1 18.8
Regex : /([^\s][\d])+(.\d+)?[^a-zA-Z][^\s]/ does everything but fails on 1, 0.9 etc .....does not capture the first digit and last digit.
Any help is appreciated.
You can use this: (?<!\S)[+-]?(?:\d+|\d*\.\d+)(?!\S)
Explanation:
(?<!\S)
check that matched pattern is not preceded by something else, than whitespace character. Equivalent to (<=\s|^)
,[+-]?
optional sign,(?:\d+|\d*\.\d+)
either integer, or floating number with optional integer part,(?!\S)
(equivalent to (?=\s|$)
) matched number from previous point should be followed by whitespace symbol (space, tab or newline). Notice that this symbols is checked, but not included into actual match.Demo here