Search code examples
regexemacselisp

Emacs Lisp: matching a repeated pattern in a compact manner?


Let's suppose I have an RGB string (format: #<2 hex digits><2 hex digits><2 hex digits>) like this:

"#00BBCC"

and I'd like to match and capture its <2 hex digits> elements in a more compact manner than by using the obvious:

"#\\([[:xdigit:]\\{2\\}]\\)\\([[:xdigit:]\\{2\\}]\\)\\([[:xdigit:]\\{2\\}]\\)"

I've tried:

"#\\([[:xdigit:]]\\{2\\}\\)\\{3\\}"

and:

"#\\(\\([[:xdigit:]]\\{2\\}\\)\\{3\\}\\)"

But the most they matched has been the first <2 hex digits> element.

Any idea? Thank you.


Solution

  • If you want to capture R,G,B in different subgroups, so that you can extract them using (match-string group), you need to have three different parentheses groups in your regexp at some point.

    \(...\)\(...\)\(...\)
    

    Otherwise, if you use a repeat pattern such as

    \(...\)\{3\}
    

    you have only one group, and after the match it will only contain the value of the last match. So, say, if you have something along the lines of

    \([[:xdigit:]]\{2\}\)\{3\}
    

    it will match a string like "A0B1C2", but (match-string 1) will only contain the contents of the last match, i.e. "C2", because the regexp defines only one group.

    Thus you basically have two options: use a compact regexp, such as your third one , but do some more substring processing to extract the hex number as Sean suggests, or use a more complex regexp, such as your first one, which lets you access the three sub-matches more conveniently.

    If you're mostly worried about code readability, you could always do something like

    (let ((hex2 "\\([[:xdigit:]]\\{2\\}\\)"))
      (concat "#" hex2 hex2 hex2))
    

    to construct such a more complex regexp in a somewhat less redundant way, as per tripleee's suggestion.