This is what I have currently, trying to clean a messy string value to only have numbers. Everything else is fine, just the issue now is that the REGEX_REPLACE is moving all decimals. When there is more than one decimal, I want to remove the first occurrence only.
TRIM(REGEXP_REPLACE(CASE
WHEN column = "-" THEN "0"
WHEN column = "o" THEN "0"
WHEN column IS NULL THEN "0"
WHEN column LIKE '%o%' THEN REPLACE (column, 'o','0')
WHEN column LIKE '%,%' THEN REPLACE (column, ',', '.')
WHEN column LIKE '%,%' THEN REGEXP_REPLACE(column, r'\.', '')
WHEN column LIKE '%/%' THEN "0"
WHEN column LIKE '%kg%' THEN REPLACE (column, 'kg', '')
WHEN LENGTH(column) - LENGTH(REPLACE(column, '.', '')) = 1 THEN column
WHEN REGEXP_EXTRACT(column, 'n') IS NOT NULL THEN "0"
WHEN REGEXP_EXTRACT(column, 'tb') IS NOT NULL THEN "0"
WHEN REGEXP_EXTRACT(column, 'REF') IS NOT NULL THEN "0"
ELSE column
END,r's/.//', '')) AS Column
I tried a few other REGEX methods but due to lack of understanding it wasn't giving the desired outcome.
An example of this query running is below
Input is the column I'm querying, and output is the result of the query being run against the column. The desired output is what I want as a result of the query. This is being achieved for all rows apart from rows 2 and 5, this is where there are multiple occurrences of decimals, and I'm trying to remove only the first occurrence only
ID | Input |Output |Desired Output
1 | 45988,8 |45988.8 |45988.8
2 | 45.856,90 |45.856.90 |45856.90
3 | 457.88 |457.88 |457.88
4 | 456,788 |456.788 |456.788
5 | 123.45.67 |123.45.67 |12345.67
6 | 45 kg |45 |45
7 | tbc |0 |0
8 | na |0 |0
9 | REF |0 |0
Consider below approach (BigQuery Standard SQL)
SELECT id, input,
( SELECT REVERSE(STRING_AGG(IF(OFFSET = 1, '.' || x, x), ''))
FROM UNNEST(SPLIT(REPLACE(REGEXP_REPLACE(REVERSE(input), r'[^0-9.,]', r''), ',','.'), '.')) x WITH OFFSET
) output
FROM your_table
if applied to sample data in your question - output is