I am making a syntax highlighting service for guitar chord sheets. I am trying to highlight the guitar chords and not the lyrics. However, it gets complicated when guitar chords can be comprised of chords + extensions.
For example,
God Is So Good
(capo 1 for Eb)
[Verse 1]
D Em A7 D
God is so good, God is so good;
D G Em D A7 D
God is so good, He’s so good to me.
I need regex to capture not only "D", "E" but also the "Dm", "Em7", "Dmaj7", "D/F#" and etc.
I have two arrays here, and the first array is capturing the chords and the second array is the optional extensions.
Array1 = {"A", "Bb", "A#", "B", "C", "C#", "D", "D#", "Eb", "E", "F", "F#", "G", "G#"}
Array2 = {"", "/", "m", "-", "1", "2", "3", "4", "5", "6", "7", "8", "9", "sus", "maj"}
How do I go about writing the regex to contain
strings in Array 1, followed by optional
strings in Array 2?
My initial take on this was to create a long regex that captures all possible chord expressions, but I want to know if there is a better way.
Edit: new example: revo, that regex didn't work with this example: something like D/F# should be matched as well.
G D/F#
How great is our God, sing with me,
Em7 D/F#
How great is our God, all will see,
edit: \b(?:[BE]b?|[ACDFG]#?)(?:sus|maj|[-1-9/m])*(?!.[a-z]|[A-Z])
works for me at the moment.
The regex doesn't have to be very long. You don't need to write out every possibility like this:
A|A#|B|Bb|C|C#...
you can shorten the first part to this:
[BE]b?|[ACDFG]#?
Shortening the second part:
sus|maj|[-1-9\/m]
And you just combine the two:
\b(?:[BE]b?|[ACDFG]#?)(?:sus|maj|[-1-9\/m])?(?!\w)
Note that the \b
at the start and (?!\w)
at the end. This ensures that substrings that are part of a word are not matched. Hence, things like G
in "God" will not be matched.
Obviously, if your array contents are unknown at compile time, you can't use such "tricks" and have to write out every possibility.