Search code examples
regexpostgresqlmaterialized-viewsregexp-replaceto-date

Converting extracted text string to date where string varies in length in Postgres


I have a materialized view of a text column that extracts a string of numbers representing a date.

The materialized view is created using the following function:

(regexp_replace(left(substring(lower(replace(content,' ','_')) from 're-inspection_date:_(.*)_'),10),'\D','','g'))

And outputs a text string in the format of MMDDYYYY except it does not account for leading zeroes for single-digit months and days.

When I try to use the "to_date" function specifying the format MMDDYYYY using the following:

(to_date(regexp_replace(left(substring(lower(replace(content,' ','_')) from 're-inspection_date:_(.*)_'),10),'\D','','g'),'MMDDYYYY'))

I get the error "date/time field value out of range: '12122018'".

I believe the issue is due to one or both of the following reasons:

  • The resulting strings from my current regexp in the materialized view vary in length (e.g. 12212018 8222018 962018) due to my regexp removing all non-integer characters. The dates are 6, 7 or 8 digits long.

  • As a result, I haven't yet been able to come up with a way of inserting a delimiter between the month/day/year values.

Is there a way to make change these output strings to date format without changing my regexp?

If not, how could I change my regexp for extracting these values?

Bear in mind that the date I'm after in the source text is formatted as 12/1/2018 and also doesn't account for leading 0's in days or months. Also, there is another date preceding the target date in the text formatted the same way.

Here is a sample of the source text:

PLACEHOLDER TEXT FOR REDACTED STUFF BLAH BLAH BLAH **** Loremipsum 11/28/2018 4: 21: 37 PM ****1 of 2 Facility Information Permit Number: 12-34-56789 Name of Facility: Dolor sit amet-consectetur Address: 123 Fake Street City, Zip: adipiscing elit12345 RESULT: sed Do Eiusmod tempor: by 8: 00 AM Re-Inspection Date: 12/4/2018 Type: Blah-Type Stuff Etc: Dolor sit amet-consectetur...

Where the "Re-Inspection Date: 12/4/2018" is what I'm after.

I'm on Postgres 11.


Solution

  • Kaushik Nayak is correct I guess. I get the same thing with this regex using a positive lookbehind (?<= Re-Inspection Date:) and allowing for any number of integers [0-9]* seperated with one slash /{1}

    SELECT to_date(substring('string'
                from '(?<=Re-Inspection Date: )[0-9]*/{1}[0-9]*/{1}[0-9]*'), 'mm/dd/yyyy');