Context is SQL on the AS/400 (IBM i)
My goal is to end up with two values: a string determined by a regex I already have, and then everything else in the source string with the result of the regex removed and the gap (if any) closed up.
Here's the SQL:
select HAD1,
regexp_substr(HAD1,'\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}'),
regexp_substr(HAD1,'**eventual_regex_goes_here**')
from ECH
where regexp_like(HAD1,'\bGATE')
And the desired result:
Ship To REGEXP_SUBSTR REGEXP_SUBSTR
Address
D2 COMPOUND, GATE 11 GATE 11 D2 COMPOUND,
2/22 GATEWAY DRIVE - 2/22 GATEWAY DRIVE
ASHBURTON FITTINGS GATE 2 GATE 2 ASHBURTON FITTINGS
BRIERLY RD, GATE A, RIVER SIDE GATE A BRIERLY RD, , RIVER SIDE
GATE 16, 37 KENEPURU DRIVE GATE 16 , 37 KENEPURU DRIVE
If the second expression could take the commas out too that would be great, but it's not required. The remaining string will be put through other (non-regex) processing to remove extraneous elements (phone numbers, comments, punctuation, etc.)
The closest of the posts suggested by the board software was this one, which gave the following string:
^.+?(?=\d{2})|(?<=\d{2}).+$
So, first I tried putting my whole expression in place of both occurrences of \d{2}
and found that this (unsurprisingly) would not process. Then I went back to a more basic test and tried to build up from there.
Let's try just the word GATE as a constant, plus a couple of boundaries ( because deep down I'm still just a child, and you know what they say: "Children need boundaries").
select had1,
regexp_substr(HAD1,'\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}'),
regexp_substr(HAD1,'^.+?(?=\bGATE\b)|(?<=\bGATE\b).+$')
from ech
where regexp_like(HAD1,'\bGATE')
Result:
Ship To REGEXP_SUBSTR REGEXP_SUBSTR
Address
GATE 3, CNR QUARRY ROAD GATE 3 3, CNR QUARRY ROAD
ASHBURTON FITTINGS GATE 2 GATE 2 ASHBURTON FITTINGS
GATE 6, HELLABYS ROAD GATE 6 6, HELLABYS ROAD
GATE 3, 548 PAKAKARIKI HILL GATE 3 3, 548 PAKAKARIKI HILL
GATE 5 - FLIGHTYS COMPOUND GATE 5 5 - FLIGHTYS COMPOUND
GATE 3 - 548 PAEKAKARIKI HILL ROAD GATE 3 3 - 548 PAEKAKARIKI HILL ROAD
GATE 14 - TAKAPU COMPOUND GATE 14 14 - TAKAPU COMPOUND
35 GATEWAY DRIVE - -
GATE 6 GATE 6 6
TAKAPU ROAD,GATE 20,SH1 GATE 20 TAKAPU ROAD,
This is looking promising, keeping in mind that I'm not using the full expression for the second result column. But there is already one thing wrong.
In the second and last row there should be more data, '2' and ',SH1', respectively. And the string '35 GATEWAY DRIVE' should be in the last column. I want everything except what the expression finds (which is just the whole word GATE at the moment, remember).
It seems it can return the remaining text from one side or other of the removed text, but not both, and not all the remaining text if nothing is found to remove. So there's no point in my pressing on with adding in more sophistication to include the gate number until I understand why I'm not getting all the text that is not GATE. Therefore I'll pause here and ask for help.
you could try this:
with data (s) as (values
('D2 COMPOUND, GATE 11'),
('2/22 GATEWAY DRIVE'),
('ASHBURTON FITTINGS GATE 2'),
('BRIERLY RD, GATE A, RIVER SIDE'),
('GATE 16, 37 KENEPURU DRIVE')
)
select s,
regexp_substr(s,' ?(GATE|LEVEL|DOOR|UNITS) '),
replace(regexp_replace(s,' ?(GATE|LEVEL|DOOR|UNITS) ',''),',',' ')
from data
Result:
D2 COMPOUND, GATE 11 GATE D2 COMPOUND 11
2/22 GATEWAY DRIVE - 2/22 GATEWAY DRIVE
ASHBURTON FITTINGS GATE 2 GATE ASHBURTON FITTINGS 2
BRIERLY RD, GATE A, RIVER SIDE GATE BRIERLY RD A RIVER SIDE
GATE 16, 37 KENEPURU DRIVE GATE 16 37 KENEPURU DRIVE