Search code examples
regexurlblogger

Use regex to match Blogger (blogspot) permalinks


I have (hacked) this regex this far which matches any words between hyphens and separates them, leaving out articles that are 1 character. The reason I need these words separate is that Blogger manages to stop the url at 39 characters AND doesn't break any words. This works so far:

^((([a-zA-Z0-9]{2,39})-)+)(?:([a-zA-Z0-9]{1})-)((([a-zA-Z0-9]{2,39})-)+){2,39}$

Tested against /wishing-you-a-very-merry-christmas-and-a-happy-new-year.html
Matches: wishing-you-a-very-merry-christmas-and-
Replacement String: $1 (not working!!) it results in:

How do I get the 1-letter articles to NOT print in the results regex? And how do I test for and remove the last - in my results?


Solution

  • You cannot build this with one regex.

    The part with max 39 characters in length and not ending with - is no problem.

    ^\/?([\w-]{3,39})(?<!-).*
    

    See it on Regexr

    (?<!-) is a lookbehind assertion that ensures that the string is not ending with a hyphen.

    But you cannot remove at the same time substrings with the length of 1.

    On its own this is also no problem

    (?<=[/-]|^)[^-]-|-[^-](?=[-./]|$)
    

    See it here on Regexr