I have been reading a lot of questions and info on REGEX but haven't found anything conclusive yet.
I am using REGEXP_SUBSTR
in amazon redshift which according to the documentation uses the regex flavour POSIX. My understanding is that it is fairly basic and cannot do look aheads.
Is there any way to match a pattern but exclude characters that come at the beginning or the end ?
I am trying to parse a user_agent
field in our database and this would really help.
Sample string 1 : 'Mozilla/5.0 (iPhone; CPU iPhone...'
Match hoped: iPhone
Sample string 2: 'Mozilla/5.0 (Windows NT 10.0; Win64; ...'
Match hoped: Windows NT 10.0
.
I am trying to match what is between the first parenthesis and the following semi-colon (iPhone
) but exclude the parenthesis and semi-colon from the match.
For the moment I am using '\\(.*;'
but it's fairly basic and feel like it could easily return a wrong match.
I understand I could couple the returning match with redshift's substring
but doing so results in fairly messy and unpredictable code. I am wondering if it would be possible to have a regex match that return just the string needed.
I know other flavours of regex that can use look aheads could solve this but they are not available in redshift as far as I know.
Any help or pointers would be super appreciated.
The nested SPLIT_PART does the job:
SPLIT_PART(SPLIT_PART(user_agent, '(', 2), ';', 1)
It does exactly what you're looking for, and an empty return can be treated as a NULL value if you want to:
NULLIF(SPLIT_PART(SPLIT_PART(user_agent, '(', 2), ';', 1), '')