Search code examples
phpregexpreg-split

RegExp for capturing "headline" trigger words in textarea


I'm trying to write a regexp for a php preg_split to capture certain "headline" like words in a textarea im processing.

I want to use the resulting array to improve formatting for the user and create a streamlined look in review posts.

$returnValue = preg_split('/[^|\n]*[\t| ]*\b(Pro|Contra|Conclusion)\b\:[\t| ]*/i', 
                           $data['review_text'],
                           -1,
                           PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);

This is my sample text input

Intro line one, first part of the array
Pro:Pro:double Pro 1, no space between
Pro: Pro:double Pro 2, space between
Pro: test Pro:double Pro 3, characters between
Pro:
Pro:double Pro 4, linebreak betweem, should create an empty pro entry
Contra:
Conclusion: the last Contra was empty
Conclusion: this Contra: in this row should not match!
Conclusion: Test with spaces between Conclusion and :
 Conclusion: this Conclusion was prefixed by a space
    Conclusion: this Conclusion was prefixed by a Tab
        Conclusion: this Conclusion was prefixed by two Tabs a space between
Conclusion : this Conclusion has a space between Conclusion and :



a final line with multiple line breaks in between, should be part of the last conclusion fragment

The result should consist of [0] as the Intro line, 4 Pro results (with their delimiters), 1 Contra (empty) and 7 Conclusion results (with their delimiters). The only Contra should be empty and the final line should be part of the last Conclusion

I'm trying to match something like this

  1. Start of line, start of file
  2. Zero or n occurrences of any white space character
  3. Any version of Pro, Contra or Conclusion (ignoring upper/lower case)
  4. Zero or n occurrences of any white space character
  5. :

In this order


Solution

  • With the help of @M42, I was able to figure out the right way...

    '/\n[\t ]*\b(Pro|Contra|Conclusion)[\t ]*:[\t ]*/i'
    

    With only the "Start of file instead of new line" missing, this does nearly exactly what I wanted (still testing though to make shure). Right now I add a "\r\n" before the string that gets stripped away later when I trim() the string fragments.

    The full PHP call looks like this

    $returnValue = preg_split('/\n[\t ]*\b(Pro|Contra|Conclusion)[\t ]*:[\t ]*/i', $data['review_text'], -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
    

    Just in case you are wondering, why I used Fazit instead of Conclusion in the reply to M42, I'm writing code for a german web app so I have to translate every copy&paste to StackOverflow. (ಠ_ಠ)