I am processing a text file of messages that resembles this (though a lot longer):
13/09/18, 4:14 pm - Fred Dag: Jackie, please could you send to me too? ‚ thank you
Hello
13/09/18, 4:45 pm - Jackie Johnson: Here is yet another message
where someone added a line break
13/09/18, 4:10 pm - Fred Dag: Here is another message
The following regex works to extract the data into Date, Time, Name and Message except where the Message includes a line break:
(?<date>(?:[0-9]{1,2}\/){2}[0-9]{1,2}),\s(?<time>(?:[0-9]{1,2}:)[0-9]{2}\s[a|p]m)\s-\s(?<name>(?:.*)):\s(?<message>(?:.+))
Using preg_match_all, and the regex above, in php7.4 I have generated the following array:
Array
(
[0] => Array
(
[date] => 13/09/18
[time] => 4:14 pm
[name] => Fred Dag
[message] => Jackie, please could you send to me too? ‚ thank you
)
[1] => Array
(
[date] => 13/09/18
[time] => 4:45 pm
[name] => Jackie Johnson
[message] => Here is yet another message
)
[2] => Array
(
[date] => 13/09/18
[time] => 4:10 pm
[name] => Fred Dag
[message] => Here is another message
)
)
But the array is missing the lines caused by the line breaks which should be appended to the previous Message. I get the same result when playing in regex101.com.
(?<message>(?s:.+))
but that then selected everything from the start of the first message to the end of the file.I think I have exhausted my knowledge of regex and reached the end of Google with the terms I know to search with :) Could anyone point me in the right direction?
Your immediate problem seems to be that the dot you are using to match the message content does not match across newlines. That can easily be fixed by using the /s
dot all flag in your PHP regex. But that aside, I think your regex would also need to change. I suggest the following pattern:
\d{2}\/\d{2}\/\d{2}, \d{1,2}:\d{1,2}.*?(?=\d{2}\/\d{2}\/\d{2}, \d{1,2}:\d{1,2}|$)
This pattern matches a line from the starting date, across newlines, until reaching either the start of the next message or the end of the input.
Sample script:
$input = "13/09/18, 4:14 pm - Fred Dag: Jackie, please could you send to me too? ‚ thank you\nHello\n13/09/18, 4:45 pm - Jackie Johnson: Here is yet another message\nwhere someone added a line break\n13/09/18, 4:10 pm - Fred Dag: Here is another message";
preg_match_all("/\d{2}\/\d{2}\/\d{2}, \d{1,2}:\d{1,2}.*?(?=\d{2}\/\d{2}\/\d{2}, \d{1,2}:\d{1,2}|$)/s", $input, $matches);
print_r($matches[0]);
This prints:
Array
(
[0] => 13/09/18, 4:14 pm - Fred Dag: Jackie, please could you send to me too? ‚ thank you
Hello
[1] => 13/09/18, 4:45 pm - Jackie Johnson: Here is yet another message
where someone added a line break
[2] => 13/09/18, 4:10 pm - Fred Dag: Here is another message
)