Search code examples
regexreadability

How do I write more maintainable regular expressions?


I have started to feel that using regular expressions decreases code maintainability. There is something evil about the terseness and power of regular expressions. Perl compounds this with side effects like default operators.

I DO have a habit of documenting regular expressions with at least one sentence giving the basic intent and at least one example of what would match.

Because regular expressions are built up I feel it is an absolute necessity to comment on the largest components of each element in the expression. Despite this even my own regular expressions have me scratching my head as though I am reading Klingon.

Do you intentionally dumb down your regular expressions? Do you decompose possibly shorter and more powerful ones into simpler steps? I have given up on nesting regular expressions. Are there regular expression constructs that you avoid due to mainainability issues?

Do not let this example cloud the question.

If the following by Michael Ash had some sort of bug in it would you have any prospects of doing anything but throwing it away entirely?

^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)\1|(?:(?:0?[13-9]|1[0-2])(\/|-|\.)(?:29|30)\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:0?2(\/|-|\.)29\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$

Per request the exact purpose can be found using Mr. Ash's link above.

Matches 01.1.02 | 11-30-2001 | 2/29/2000

Non-Matches 02/29/01 | 13/01/2002 | 11/00/02


Solution

  • I usually just try to wrap all my Regular Expression calls inside their own function, with a meaningful name and an some basic comments. I like to think of Regular Expressions as a write only language, readable only by the one that wrote it (Unless it's really simple). I fully expect that someone would need to probably completely re-write the expression if they had to change its intent and this is probably for the better to keep the Regular Expression training alive.