Search code examples
phpregexpreg-match

Splitting a string by multiple separators with preg_match in PHP


There is a string consisting of maximum three parts: Writer, Director, and Producer. Let's call them "categories". Each category consists of two parts separated by a colon: Label : Names, where Label is one of the mentioned category names, and Names is a list of names separated by slashes. E.g.:

Writer : Jeffrey Schenck / Peter Sullivan / Director : Brian Trenchard-Smith / jack / Producer : smith

I want to break the string into parts by the category names and the name lists with preg_match function. Here is what I have so far:

$pattern = '/Writer : (?P<Writer>[\s\S]+?)Director : (?P<Director>[\s\S]+?)Producer : (?P<Producer>[\s\S]+)/';
$sentence = 'Writer : Jeffrey Schenck / Peter Sullivan / Director : Brian Trenchard-Smith / jack / Producer : smith';
preg_match($pattern, $sentence, $matches);

foreach($matches as $cat => $match) {
  // Do more
  // echo "<b>" . $cat . "</b>" . $match . "<br />";
}

The script works well, if there are exactly all three categories in the string. It fails, if at least one of the categories is missing.


Solution

  • One way is to create optional groups with the well-known ? quantifier:

    $pattern = '/^' .
      '(?:Writer *: *(?P<Writer>[^:]+))?' .
      '(?:Director *: *(?P<Director>[^:]+))?' .
      '(?:Producer *: *(?P<Producer>[^:]+))?' .
      '$/';
    preg_match($pattern, $sentence, $matches);
    

    where (?:) creates a non-capturing group. Note, the output array will be indexed by both numeric position indexes and names, e.g.:

    Array
    (
        [0] => Writer : Jeffrey Schenck / Peter Sullivan / Director : Brian Trenchard-Smith / jack / Producer : smith
        [Writer] => Jeffrey Schenck / Peter Sullivan / 
        [1] => Jeffrey Schenck / Peter Sullivan / 
        [Director] => Brian Trenchard-Smith / jack / 
        [2] => Brian Trenchard-Smith / jack / 
        [Producer] => smith
        [3] => smith
    )
    

    Another way is to use preg_match_all with extra processing:

    $pattern = '/(?<=:)[^:]+/';
    if (preg_match_all($pattern, $sentence, $matches)) {
      $keys = ['Writer', 'Director', 'Producer'];
      for ($i = 0; $i < count($matches[0]); ++$i)
        // The isset() checks are skipped for clarity's sake
        $a[$keys[$i]] = $matches[0][$i];
    
      print_r($a);
    }
    

    where (?<=:) is a positive lookbehind assertion for the : character. In this case, the resulting array will have a neat appearance:

    Array
    (
        [Writer] =>  Jeffrey Schenck / Peter Sullivan / Director 
        [Director] =>  Brian Trenchard-Smith / jack / Producer 
        [Producer] =>  smith
    )