Search code examples
regexpcre

Regex to match string with prime length?


Given an input string with an arbitrary number of 'x' characters (x, xx, xxxxx, xxxxxxxxxxxxx and so on), how can one write a regex that matches the input string only if it has a prime number of 'x' characters? A string of length 1 should not be matched.

For example:

Match these: xx xxx xxxxx xxxxxxx

But not these: x xxxx xxxxxxxxx

This is one solution I found - ^(?!(xx+)\1+$) (here, as an answer to this problem). However, I would like to know why it works. Please share any alternate solutions as well.

I'm using the PCRE engine.

I realize that one would typically not use regexes for this sort of thing. I am simply curious about how it could be done.


Solution

  • ^(?!(xx+)\1+$)
    

    works by performing a negative lookahead at the start of the line. It will reject any line that consists of

    • a number of 2 or more xes
    • followed by the same number of xes, possibly multiple times

    In other words, the regex works by matching any number of xes that can not be divided in smaller, equally sized groups of size >= 2.

    To exclude the case of just a single x, you could use ^(?!(xx+)\1+$|x$).