Search code examples
regexmatchregex-lookaroundslookbehind

Regular expression syntax to match first segment only


I have number of URLs where I need to match first segment without "/" with Regex

This segment can be either xx or xx-xx.

I've tried to do it with lookahead and lookbehind but sometimes in the URL I have another 2 letter segment. (/ts/; /ca/) I don't want /ts; /ca/ them to match. I only want first segment in my Regex. Any suggestions? Thanks.

https://regex101.com/r/Qy3nyI/1

(?<=\/)\w{2}(-\w{2})?(?=\/)

Test urls:

/en/home.aspx
/en-gb/ts/tc/home.aspx
/en-gb/home.aspx
/en-de/home.aspx
/de-de/home.aspx
/en/home.aspx
/en-fb/afspfas.aspx
/en-gb/ts/ca/anotherPage.aspx

Solution

  • Try adding a starting ^ anchor to the initial lookbehind in your current regex pattern:

    (?<=^/)\w{2}(-\w{2})?(?=/)
        ^^ change is here
    

    Updated demo:

    Demo

    This pattern says to:

    (?<=^/)         lookbehind and assert that what precedes is a leading /
    \w{2}(-\w{2})?  then match the country abbreviation text
    (?=/)           lookahead and assert that what follows is another /