I'm looking for the REGEXP_SUBSTR code that gets dates like format '06-11-2014 - 05-12-2014' or format '01/11/2019 - 30/11/2019' from a string. The first date being the startdate and the second date being the enddate. It would be extremely helpful to understand how the REGEXP_SUBSTR works in this case and also why. I want to get the string with the two dates, but then I want both dates to be in their own column.
A record look likes this:
Medium - nl (06-11-2014 - 05-12-2014) ruimte: Standaard (5.000 MB).
Although text can be shorter or longer the two dates between brackets are always there.
The code below gets the first one, but only if it's with '-'. I want both '-' and '/' variants displayed.
REGEXP_SUBSTR(description, '[0-9][0-9][-[0-9][0-9]-[0-9][0-9][0-9][0-9]')
Thanks a lot for any and all help.
Since you are using MySQL 8+, it means you also have access to the REGEXP_REPLACE
function, which is suitable for isolating the portion of the string which contains the two dates. In the CTE below, I isolate the date string, then in a subquery on that CTE, I fish out the two dates in separate columns using SUBSTRING_INDEX
.
WITH cte AS (
SELECT
text,
REGEXP_REPLACE(text, '^.*\(([0-9]{2}-[0-9]{2}-[0-9]{4} - [0-9]{2}-[0-9]{2}-[0-9]{4})\).*$', '$1') AS dates
FROM yourTable
)
SELECT
text,
SUBSTRING_INDEX(dates, ' - ', 1) AS first_date,
SUBSTRING_INDEX(dates, ' - ', -1) AS second_date
FROM cte;
Here is an explanation of the regex pattern used:
^ from the start of the string
.* match any content, until hitting
\( '(' which is followed by
( (capture what follows)
[0-9]{2}-[0-9]{2}-[0-9]{4} a single date
- -
[0-9]{2}-[0-9]{2}-[0-9]{4} another single date
) (stop capture)
\) ')'
.* match the remainder of the content
$ end of the string
Note that we include a pattern which matches the entire input, which is a requirement since we want to use a capture group. Also, note that REGEXP_SUBSTR
might have been viable here, but it could run the risk that you get false positives, in the event that a date could appear elsewhere besides the terms in parentheses.