Search code examples
phpregexbem

Regex pattern for splitting BEM string into parts (PHP)


I would like to isolate the block, element and modifier parts of a string via PHP regex. The flavour of BEM I'm using is lowercase and hyphenated. For example:

this-defines-a-block__this-defines-an-element--this-defines-a-modifier

My string is always formatted as the above, so the regex does not need to filter out any invalid BEM, for example, I will never have dirty strings such as:

This.defines-a-block__this-Defines-an-ELEMENT--090283

Block, Element and Modifier names could contain numbers, so we could have any combination of the following:

this-is-block-001__this-is-element-001--modifier-002

Finally a modifier is optional so not every string will have one for example:

this-is-a-block-001__this-is-an-element
this-is-a-block-002__this-is-an-element--this-is-an-optional-modifier

I am looking for some regex to return each section of the BEM markup. Each string will be isolated and sent to the regex individually, not as a group or as multiline strings. The following sent individually:

# String 1
block__element--modifier

# String 2
block-one__element-one--modifier-one

# String 3
block-one-big__element-one-big--modifier-one-big

# String 4
block-one-001__element-one-001

Would return:

# String 1
block
element
modifier

# String 2
block-one
element-one
modifier-one

# String 3
block-one-big
element-one-big
modifier-one-big

# String 4
block-one-001
element-one-001

Solution

  • You could use 3 capturing groups and make the third one optional using the ?

    As all 3 groups are lowercase, can contain numbers and use the hyphen as a delimiter you might use a character class [a-z0-9].

    You could reuse the pattern for group 1 using (?1)

    \b([a-z0-9]+(?:-[a-z0-9]+)*)__((?1))(?:--((?1)))?\b
    

    Explanation

    • \b Word boundary
    • ( First capturing group
      • [a-z0-9]+ Repeat 1+ times what is listed in the character class
      • (?:-[a-z0-9]+)* Repeat 0+ times matching - and 1+ times what is in the character class
    • ) Close group 1
    • __ Match literally
    • ((?1)) Capturing group 2, recurse group 1
    • (?: Non capturing group
      • -- Match literally
      • ((?1)) Capture group 3, recurse group 1
    • )? Close non capturing group and make it optional
    • \b Word boundary

    Regex demo

    Or using named groups:

    \b(?<block>[a-z0-9]+(?:-[a-z0-9]+)*)__(?<element>(?&block))(?:--(?<modifier>(?&block)))?\b
    

    Regex demo