Filter strings with regex before casting to numeric

I have this code (was already there, isn't mine):

SELECT
    a.id_original_contrato AS contrato,
    ( CASE WHEN d.value~'^\\d+$' THEN d.value::integer ELSE 0 END ) AS monto,
    EXTRACT(YEAR FROM b.value)::integer AS anoinicio,
    EXTRACT(YEAR FROM c.value)::integer AS anofin

... etc (some JOIN's and WHERE's)

Let me explain: d.value comes from a table where value is character varying (200). The code will insert later the d.value (now called 'monto') in another table as a integer. Someone coded that regex in order to extract some chars or in other case (ELSE), define it as 0. Those values works when they are integer only. If I use a d.value like 76.44 it doesn't work due that regex, it always define it as 0.

Well, I have to change that code, because:

I need to store the d.value in the new table as numeric, not as integer anymore (In my new table the data type is numeric now)
But first, I need to correct that regex, because is messing my numeric numbers like 76.44, or 66,56 (dot or coma).

I'm not sure what that regex is doing. And how could I accomplish what need with a better or new regex?

Solution

The double backslash in \\d suggests an old version with standard_conforming_strings = off. The manual:

Beginning in PostgreSQL 9.1, the default is on (prior releases defaulted to off).

In modern versions with standard_conforming_strings = on, this string makes little sense as regular expression: ~~'^\\d+$'~~. To detect strings consisting of one or more digits use either E'^\\d+$' (prefixed with E) or '^\d+$'. Details:

Insert text with single quotes in PostgreSQL

Integer literals also allow an optional leading sign for negative / positive numbers, and leading / dangling white space in Postgres.
So, this is the complete regular expression for valid integer literals:

CASE WHEN d.value ~ '^\s*[-+]?\d+\s*$' THEN d.value::int ELSE 0 END

The regular expression explained:

^ .. start of string
\s .. class shorthand for [[:space:]] (white space)
* .. quantifier for 0 or more times
[+-] .. character class consisting of + and -
? .. quantifier for 0 or 1 times
\d .. class shorthand for [[:digit:]] (digits)
+ .. quantifier for 1 or more times
\s* .. same as above
$ .. end of string

Consider the syntax rules for numeric string literals. One essential quote:

There cannot be any spaces or other characters embedded in the constant

That's because a numeric constant is not quoted, hence white space is not possible. Not applicable for casting strings. White space is tolerated:

Leading, trailing and right after the exponent char.

So these are all legal strings for the cast to numeric:

'^\s*[-+]?\d*\.?\d+(?:[eE]\s*[-+]?\d+)?\s*$'

The only new element are parentheses (()) to denote the contained regular expression as atom. Since we are not interested in back references, use "non-capturing": (?:...) and append a question mark (?:[eE]\s*[-+]?\d+)? to mean: the "exponential" part can be added or not, as a whole.

Assuming dot (.) as decimal separator. You might use comma instead (,) or [,\.] to allow either. But only dot is legal for the cast.

Test:

SELECT '|' || lit || '|' AS text_with_delim
     , lit ~ '^\s*[-+]?\d*\.?\d+([eE]\s*[-+]?\d+)?\s*$' AS valid
     , lit::numeric AS number
FROM   unnest ('{1
               , 123
               , 000
               , "  -1     "
               , +2
               , 1.2
               , .34
               , 5e6
               , " .5e   -6  "
                }'::text[]) lit;

Result:

text_with_delim	valid	number
\|1\|	t	1
\|123\|	t	123
\|000\|	t	0
\| -1 \|	t	-1
\|+2\|	t	2
\|1.2\|	t	1.2
\|.34\|	t	0.34
\|5e6\|	t	5000000
\| .5e -6 \|	t	0.0000005

Or you might have use for to_number() to convert strings of arbitrary given format.