Search code examples
regexxml

Matching a hash character (#) with a regex


I have an XML document that contains a regular expression (so you don't need to escape with \). Basically I'm trying to match musical chord symbols, and this regex works fine, but refuses to match a hash:

\b[A-G](m|b|\#|sus|\d)*?\b

Solution

  • The problem is that \b, the word boundary anchor, only matches between alphanumeric and non-alphanumeric characters, so it won't match after a # (unless that is itself followed by an alphanumeric).

    Use

    \b[A-G](?:m|b|#|sus|\d)*(?:\b|(?<=#))
    

    No need to escape the #, either.

    EDIT: Changed the regex to better reproduce the intended functionality (as I think it should be)

    Currently, you're not matching some chords, though; how about

    \b[A-G](?:add|maj|j|m|-|b|#|sus|\d|°)*(?:\b|(?<=[#°-]))
    

    That way, you can match all of these:

    A7
    Abm7 
    A#m7sus4
    A7b9#13
    Amaj7#11
    A#°
    Abj7add13
    

    I guess there is still room for improvement, though.