I have a big set of data. I try to parse one column to extract a substring that's a date and cast it as such - date(<substring here>)
. I'm getting an error ERROR: Error converting text to date
but I don't know what the actual issue is. How do I find the values that are causing a problem? Something like try_cast, but that doesn't work in Redshift. I'm not sure I can use regex since I don't know the format of what I'm looking for.
Your question is broad and not very specific as the task which you're trying to solve, so it's hard to provide a correct answer
Say, you have the following data in TableA
id | dt |
---|---|
1 | 2020-08-20 |
2 | 2021-08-20 |
3 | 2021-08-21 |
4 | 2021-08-2000 |
5 | asdfghjkl |
6 | 08-01-2021 |
7 | 06/07/2021 |
with pattern matching you can find all the rows with correct dates
select id from TableA
where dt similar to '\\d{4}-\\d{2}-\\d{2}'
or dt similar to '\\d{2}-\\d{2}-\\d{4}'
or dt similar to '\\d{2}/\\d{2}/\\d{4}'
All you need to do now is to reverse this query to find opposites
select id, dt from TableA
where id not in (
select id from logs.sot
where dt similar to '\\d{4}-\\d{2}-\\d{2}'
or dt similar to '\\d{2}-\\d{2}-\\d{4}'
or dt similar to '\\d{2}/\\d{2}/\\d{4}'
)
This will give you result
id | dt |
---|---|
5 | asdfghjkl |
4 | 2021-08-2000 |
If this does not work, you can try to sort by date column and validate a head and tail - bad values usually live here
Use processing outside Redshift if possible. Usually it's good practice to take care about data cleaning before putting it into database. I believe a simple python (or any other language) script will make the job