Search code examples
sqldatecastingtry-catchamazon-redshift

How to find values that can't be cast as date in Redshift?


I have a big set of data. I try to parse one column to extract a substring that's a date and cast it as such - date(<substring here>). I'm getting an error ERROR: Error converting text to date but I don't know what the actual issue is. How do I find the values that are causing a problem? Something like try_cast, but that doesn't work in Redshift. I'm not sure I can use regex since I don't know the format of what I'm looking for.


Solution

  • Your question is broad and not very specific as the task which you're trying to solve, so it's hard to provide a correct answer

    1. You can use a pattern matching

    Say, you have the following data in TableA

    id dt
    1 2020-08-20
    2 2021-08-20
    3 2021-08-21
    4 2021-08-2000
    5 asdfghjkl
    6 08-01-2021
    7 06/07/2021

    with pattern matching you can find all the rows with correct dates

    select id from TableA 
    where dt similar to '\\d{4}-\\d{2}-\\d{2}'
      or dt similar to '\\d{2}-\\d{2}-\\d{4}'
      or dt similar to '\\d{2}/\\d{2}/\\d{4}'
    

    All you need to do now is to reverse this query to find opposites

    select id, dt from TableA
    where id not in (
    select id from logs.sot 
    where dt similar to '\\d{4}-\\d{2}-\\d{2}'
      or dt similar to '\\d{2}-\\d{2}-\\d{4}'
      or dt similar to '\\d{2}/\\d{2}/\\d{4}'
    )
    

    This will give you result

    id dt
    5 asdfghjkl
    4 2021-08-2000
    1. If this does not work, you can try to sort by date column and validate a head and tail - bad values usually live here

    2. Use processing outside Redshift if possible. Usually it's good practice to take care about data cleaning before putting it into database. I believe a simple python (or any other language) script will make the job