Search code examples
phpregexpreg-matchstr-replace

Trouble with regex / preg_match / str_replace with complicated string


I am having trouble with a complicated regex. I have tried this every way I can think of, and I can always only get "almost" there --

I have a block of 5 messages:

---Agent 1: Wednesday 08/16/2017 | 11:43 AM ---

Message 1

--- Agent 1: Friday 06/09/2017 | 9:02 AM ---

Message 2

--- Agent 1: Friday 04/14/2017 | 10:35 AM ---

Message 3

--- Agent 1: Monday 02/13/2017 | 12:07 AM ---

This

is

message

3

 --- Agent 1: Monday 12/19/2016 | 1:31 PM ---

 Message 4 

 --- Agent 1: Monday 10/24/2016 | 10:48 AM ---

 Message 5

One problem is that some of them have a space before the first ---. Another is multi-line messages.

What I am trying to do is peel out all the individual messages. Basically everything between the first occurrence of --- and every other occurrence thereafter. I would like my result to look like:

---Agent 1: Wednesday 08/16/2017 | 11:43 AM ---

Message 1

I have tried variations of ---.*? (---) (matching every other ---) But then I have no way of parsing out the message itself. I have also tried to manually do this:

(?<=\: )(.*?)(?= \|)|(\---)(\r\n|\r|\n)(\r\n|\r|\n)(.*?)(\r\n|\r|\n)(\r\n|\r|\n)(\---)

Which works until you have a multi-line message. (message 3)

I have also tried multiple steps -- trimming the first or last --- using str_replace but this is foiled by the ones that have a preceding space!

It's always the second --- after the time, the multi-line message, or the preceding space before --- that trips me up. Does anyone have a more elegant solution than the monstrosity I am creating?


Solution

  • /---.*---\s*\R.*(?=---|$)/gsU
    

    ---.*---\s*\R - selects the 1st line with message description. Than .*(?=---|$) gets the rest of a message up to start of the next one (---) or the end of the string.

    Demo and a little explanation