While ingesting some very poor quality data I'd like to use strptime
and nullify any entries that error on strptime.
I can manually specify exclusions with a case statement:
select * EXCLUDE (date_of_birth),
CASE
WHEN date_of_birth[5:6]::int <= 12 AND
date_of_birth[5:6]::int > 0 AND
date_of_birth[7:]::int <= 31 AND
NOT (date_of_birth[5:6]::int = 11 AND date_of_birth[7:]::int >= 31) AND
NOT (date_of_birth[5:6]::int = 2 AND date_of_birth[7:]::int >= 29)
THEN strptime(date_of_birth, '%Y%m%d')::DATE
ELSE NULL
END AS date_of_birth
from read_csv_auto(
'bad_data.csv',
types={'date_of_birth': 'VARCHAR'}
)
But this method is very unsatisfactory. For one I can't anticipate all possible bad dates - and in this instance I'm not able to change the source system. Can anyone point me in a better direction?
The newer try_strptime()
function introduced by @hawkfish in a comment last year is the most concise (and likely most robust) solution.
SELECT
try_strptime(date_of_birth, '%Y%m%d')::date as date_of_birth
FROM ...