I need to find a pattern which will give me a substring which follows logic to substring from the right side, after the third _
.
What I want to achieve is to retrieve everything up until 3rd _
.
I tried this SQL, but it did not work
select
SUBSTRING('22D_XYZ_xy_100_xyz_123', 1, position('_' in REVERSE('22D_XYZ_xy_100_xyz_123')) - 1) as result;
It returns the 22D
whereas I expect 22D_XYZ_xy
The phrase, "substring which follows logic to substring from the right side, after the third _
," is difficult to understand. I infer that the intended behavior is to extract the subtring that includes all text before the third _
from the right and not all text before the third _
from the left. When searching from the left, there is also ambiguity as to expected behavior if there are less than three _
characters: should the entire string be returned or should the result be NULL
? Unfortunately, the example input, 22D_XYZ_xy_100_xyz_123
, and expected output, 22D_XYZ_xy
, are consistent with all of these cases. The following query demonstrates all three interpretations:
SELECT
string,
regexp_substr(string, '^.*(?=(_[^_]*){3}$)') as from_right,
regexp_substr(string, '^([^_]*_){0,2}[^_]*') as from_left,
regexp_substr(string, '^([^_]*_){2}[^_]*(?=_)') as from_left_requires_3
FROM (
VALUES ('22D_XYZ_xy_100_xyz_123'),
('1_2_3_4_5_6_7'),
('1_2_3_4_5_6'),
('1_2_3_4_5_'),
('_1_2_3_4_5'),
('1_2_3_4_5'),
('1_2_3_4'),
('1_2_3'),
('1_2_'),
('1_2'),
('1'),
('____'),
('___'),
('__'),
('_'),
(''),
(NULL)) t(string);
Executing the query gives the following:
string | from_right | from_left | from_left_requires_3 |
---|---|---|---|
'22D_XYZ_xy_100_xyz_123' | '22D_XYZ_xy' | '22D_XYZ_xy' | '22D_XYZ_xy' |
'1_2_3_4_5_6_7' | '1_2_3_4' | '1_2_3' | '1_2_3' |
'1_2_3_4_5_6' | '1_2_3' | '1_2_3' | '1_2_3' |
'1_2_3_4_5_' | '1_2_3' | '1_2_3' | '1_2_3' |
'_1_2_3_4_5' | '_1_2' | '_1_2' | '_1_2' |
'1_2_3_4_5' | '1_2' | '1_2_3' | '1_2_3' |
'1_2_3_4' | '1' | '1_2_3' | '1_2_3' |
'1_2_3' | NULL | '1_2_3' | NULL |
'1_2_' | NULL | '1_2_' | NULL |
'1_2' | NULL | '1_2' | NULL |
'1' | NULL | '1' | NULL |
'____' | '_' | '__' | '__' |
'___' | '' | '__' | '__' |
'__' | NULL | '__' | NULL |
'_' | NULL | '_' | NULL |
'' | NULL | '' | NULL |
NULL | NULL | NULL | NULL |
To search from the right, a positive lookahead zero-length assertion, (?=(_[^_]*){3}$)
, is used to check that the unmatched portion of the string begins with _
and includes exactly three _
characters. If the assertion is false, then NULL
is returned.
To find the left substring with no more than two _
characters, the regular expression matches up to two substings of zero or more non-_
characters followed by a single _
along with any non-_
characters prior to the next _
or the end of the string.
Finding the left substring preceding a required third _
is similar to finding the left substring with no more than two _
characters, except that the occurrence specifier for the group is {2}
instead of {0,2}
and a positive lookahead zero-lengh assertion, (?=_)
, ensures that _
is the first unmatched character. As with the search from the right, if the assertion is false, then NULL is returned.