I have this code (was already there, isn't mine):
SELECT
a.id_original_contrato AS contrato,
( CASE WHEN d.value~'^\\d+$' THEN d.value::integer ELSE 0 END ) AS monto,
EXTRACT(YEAR FROM b.value)::integer AS anoinicio,
EXTRACT(YEAR FROM c.value)::integer AS anofin
... etc (some JOIN's and WHERE's)
Let me explain: d.value
comes from a table where value is character varying (200)
. The code will insert later the d.value
(now called 'monto') in another table as a integer
. Someone coded that regex in order to extract some chars or in other case (ELSE), define it as 0. Those values works when they are integer
only. If I use a d.value like 76.44
it doesn't work due that regex, it always define it as 0.
Well, I have to change that code, because:
numeric
, not as integer
anymore (In my new table the data type is numeric
now)76.44
, or 66,56
(dot or coma).I'm not sure what that regex is doing. And how could I accomplish what need with a better or new regex?
The double backslash in \\d
suggests an old version with standard_conforming_strings = off
. The manual:
Beginning in PostgreSQL 9.1, the default is
on
(prior releases defaulted to off).
In modern versions with standard_conforming_strings = on
, this string makes little sense as regular expression: . To detect strings consisting of one or more digits use either '^\\d+$'
E'^\\d+$'
(prefixed with E
) or '^\d+$'
. Details:
Integer literals also allow an optional leading sign for negative / positive numbers, and leading / dangling white space in Postgres.
So, this is the complete regular expression for valid integer
literals:
CASE WHEN d.value ~ '^\s*[-+]?\d+\s*$' THEN d.value::int ELSE 0 END
The regular expression explained:
^
.. start of string
\s
.. class shorthand for [[:space:]]
(white space)
*
.. quantifier for 0 or more times
[+-]
.. character class consisting of +
and -
?
.. quantifier for 0 or 1 times
\d
.. class shorthand for [[:digit:]]
(digits)
+
.. quantifier for 1 or more times
\s*
.. same as above
$
.. end of string
Consider the syntax rules for numeric string literals. One essential quote:
There cannot be any spaces or other characters embedded in the constant
That's because a numeric constant is not quoted, hence white space is not possible. Not applicable for casting strings. White space is tolerated:
Leading, trailing and right after the exponent char.
So these are all legal strings for the cast to numeric
:
'^\s*[-+]?\d*\.?\d+(?:[eE]\s*[-+]?\d+)?\s*$'
The only new element are parentheses (()
) to denote the contained regular expression as atom. Since we are not interested in back references, use "non-capturing": (?:
...)
and append a question mark (?:[eE]\s*[-+]?\d+)?
to mean: the "exponential" part can be added or not, as a whole.
Assuming dot (.
) as decimal separator. You might use comma instead (,
) or [,\.]
to allow either. But only dot is legal for the cast.
Test:
SELECT '|' || lit || '|' AS text_with_delim
, lit ~ '^\s*[-+]?\d*\.?\d+([eE]\s*[-+]?\d+)?\s*$' AS valid
, lit::numeric AS number
FROM unnest ('{1
, 123
, 000
, " -1 "
, +2
, 1.2
, .34
, 5e6
, " .5e -6 "
}'::text[]) lit;
Result:
text_with_delim | valid | number |
---|---|---|
|1| | t | 1 |
|123| | t | 123 |
|000| | t | 0 |
| -1 | | t | -1 |
|+2| | t | 2 |
|1.2| | t | 1.2 |
|.34| | t | 0.34 |
|5e6| | t | 5000000 |
| .5e -6 | | t | 0.0000005 |
Or you might have use for to_number()
to convert strings of arbitrary given format.