I am trying to make markup to format an Ordered list, here is the markup style:
$strings = "1. dog
1. cat
1. fish
1. horse
1. monkey
1. pig
";
horse
and monkey
from that list should be part of a sublist, since they have one space before the number. Here is the code that I am using:
function blq($match){
$str = preg_replace("/^1\. (.+?)$/m", "<li>$1</li>", $match[0]);
$str = preg_replace_callback("/(?:^1\. .+(\n|$))+/m", 'blq', $str);
return "<ol>$str</ol>";
}
$string = preg_replace_callback("/(?:^ ?1\. .+(\n|$))+/m", 'blq', $strings);
echo $string;
That code is creating this output:
<ol><li>dog
</li>
<li>cat
</li>
<li>fish
</li>
1. horse
1. monkey
<li>pig
</li>
</ol>
horse
and monkey
were not created as a sublist, but just ignored. I feel that I am getting close to the answer, but I am not sure what to do to get to that answer...
Note I would like to allow an unlimited number of sublists
<?php
$text = "1. dog
1. cat
1. fish
1. horse
1. duck
1. goose
1. swan
1. monkey
1. chimpanzee
1. orangutan
1. whale
1. pig
";
function callback($match) {
$out = preg_replace_callback("/(^($match[2] +)1\. .+(\\n|$))(?1)*/m", 'callback', $match[0]);
$out = preg_replace("/^$match[2]1\. (.+)$/m", "<li>$1</li>", $out);
return "<ol>\n$out</ol>\n";
}
$html = preg_replace_callback("/(^( *)1\. .+(\\n|$))(?1)*/m", 'callback', $text);
echo $html;
?>
Here's an ideone demo.
That's a pretty neat idea you had, using preg_replace_callback
recursively. Also, you're right about $
-strings not interpolating within double quotes unless they're a set variable; I always forget that. And, you were right to use /m
since you want ^
match the beginning of each line (not the beginning of the entire string) and you were also right to use (\n|$)
despite that $
matches the end of each line in /m
mode—because otherwise, the quantifier +
wouldn't work because $
wouldn't actually consume the \n
. I didn't see these facts when I first read your question.
Now, let's start with the first expression:
/(^( *)1\. .+(\\n|$))(?1)*/m
Actually, the recursive subexpression, (?1)
, isn't necessary except as shorthand. Let's expand that:
/(^( *)1\. .+(\\n|$))(^( *)1\. .+(\\n|$))*/m
| || |
+------------------++------------------+
So we have two identical halves. Why not just use +
as you did? Because I want to capture the number of spaces indenting the first line, only. Those spaces get stored in $match[2]
.
Within the callback, we bring those spaces back, plus one or more spaces:
/(^($match[2] +)1\. .+(\\n|$))(?1)*/m
That way, we only ever look at levels beneath the current level of indentation (more spaces), on each level of preg_replace_callback
recursion. And as the recursions unwind, only the lines indented by exactly that level's number of spaces, $match[2]
, are wrapped in <li></li>
,
/^$match[2]1\. (.+)$/m
before returning the whole wrapped in <ol></ol>
.