I have this multi-line string:
Lorem ipsum dolor sit amet.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus
dictum, lorem et fringilla congue, velit libero sagittis eros, id
lobortis nisi risus ac mauris.
I would like to use PHP Compatible Regular Expression to "name capture" the second "paragraph" (the 3-line text after the new line).
I tried the following regular expression on regex101 and it works fine :
/\n(\n)+(?<namedGroup>([\w\d]+.*(\n)?)+)/m
but when I tried it in PHP using the following code, nothing gets captured :
<?php
$text = file_get_contents("paragraphs.txt");
$regular_expression = '/\n(\n)+(?<namedGroup>([\w\d]+.*(\n)?)+)/m';
preg_match($regular_expression, $text, $result);
print_r($result);
?>
Currently you are using the pattern like this, for which there can be some improvements:
$regular_expression = '/\n(\n)+(?<namedGroup>([\w\d]+.*(\n)?)+)/m';
You are only matching a newline \n
and apparently you have \r\n
in your file. To match those you can use \R
to match any Unicode newline sequence.
If you want to match only a single value for (?<namedGroup>
you can actually omit that group at all when making use of \K
to discard what is matched so far.
Note that:
[\w\d]
is the same as \w
as that also matches digits/m
multiline flag, as there are no anchors in the pattern\w
(\n)+
only captures the value of the last iterationThe updated pattern that you could use for a single match:
\R{2,}\K\w.*(?:\R\w.*)*
\R{2,}
Match 2 or more Unicode newline sequences\K
Forget what is matched so far\w.*
Match a word character and the rest of the line(?:\R\w.*)*
Optionally repeat a Unicode newline sequence, a word character and the rest of the lineOr match only lines that start with a non whitespace character \S
\R{2,}\K\S.*(?:\R\S.*)*