Search code examples
sqlsql-serverdateocrdata-cleaning

SQL: Cleansing dates from OCR'ed documents


I have a variety of dates from several .pdf documents that were OCR'ed, the dates have imported in the format DD/MM/YYYY. As the documents are photocopies they are not optimal quality resulting in issues where the dates struggled to distinguish between / and 1 as follows:

Issue: 0110212015 01/0212015 01102/2015

Desired Output: 01/02/2015

As the strings have retained their length of the original dates I was hoping to simply replace the character in Nth position with /. To my knowledge and that gained from searching, SQL does not have a function that simple. Are their substitute or replace functions that can target the exact 1's I need to change? Or can I reconstruct the string using my existing values?


Solution

  • SQL Server does have a function to do that... it's called STUFF.

    So your expression would look something like:

    STUFF(STUFF (DateString , 3, 1, '/'), 6, 1, '/')