I'm building out a Google Data Studio dashboard and I need to create a calculated field for the year a post was published. The year is in the URI path, but I'm not sure how to extract it using REGEXP_EXTRACT
. I've tried a number of solutions proposed on here but none of them seem to work on Data Studio.
In short, I have a URI like this: /theme/2019/jan/blog-post-2019/
How do I use the REGEXP_EXTRACT
function to get the first 2019 after theme/
and before /jan
?
Try this:
REGEXP_EXTRACT(Page, 'theme\/([0-9]{4})\/[a-z]{3}\/')
where:
theme\/
means literally "theme/";([0-9]{4})
is a capturing group containing 4 characters from 0 to 9 (i.e. four digits);\/[a-z]{3}\/
means a slash, followed by 3 lowercase letters (supposing that you want the regex to match all the months), followed by another slash. If you want something more restrictive, try with \/(?:jan|feb|mar|...)\/
for the last part.See demo.