I have a table named
logs
and column named
url
whose value is like this
https://api.abc.com:443/xyz/live/dashboard_critical/v1?cust_id=-1111%7C-1111%7C-1111%7C-1111%7C-1111%7C-1111%7C-1111%7C-1111%7C-1111%7C-1111%7C-1111&model_id=&startdate=2021-06-29&enddate=2021-07-28&wcombo=&site_id=&instance=&unit=&combo=&source=&priority=&verification=&critical=Yes%7Ctrue&percentile=95&startevent=receiveddatetime&endevent=uploadtohostdatetime
I trying to extract all strings after '/' OR '?' OR '%7C' OR '&'
I am able to extract it individually but not all 4 together
SELECT regexp_extract_all(url, '\d+[/]*')
FROM logs.url
WHERE request_verb='GET'
AND REGEXP_LIKE(url, 'https://api.abc.com:443/xyz/live/');
I am looking for output like below
[https:,api.abc.com:443,xyz,live,dashboard_critical,v1,cust_id=-1111,-1111,-1111,-1111,-1111,-1111,-1111,-1111,-1111,-1111,-1111,model_id=,startdate=2021-06-29,enddate=2021-07-28,wcombo=,site_id=,instance=,unit=,combo=,source=,priority=,verification=,critical=Yes,true,percentile=95,startevent=receiveddatetime,endevent=uploadtohostdatetime]
You can use regexp_split(str, regexp)
function, as a regexp pattern concatenate all values by wich string should be splitted using |
(OR in regexp), it will produce array required. Note: some characters have special meaning in Presto CLI or regexp and need shielding.
select regexp_split(url,'/+|\?|%%7C|&')
Regexp meaning:
/+
- one or more slash
|
- OR
\?
- literally ?
, needs shielding by backslash because ?
has special meaning in regexp
|
- OR
%%7C
- literally %7C
. %
has special meaning in Presto CLI and presto is using double-character notation for shielding.
|
- OR
&
- literally &
After you got an ARRAY, you can concatenate it or explode, etc