Search code examples
phpregexpcre

Why does this regex only mach the last occurence of the pattern


I'm trying to create a regex which will create html out of markup code.

When trying to replace a part of the [table] markup, it only replaces the last occurence.

I have the following regex (PHP):

/(\[table].*)\[\|](.*\[\/table])/s

Replace pattern:

$1</td><td>$2

And the following test string:

[table]<thead>
<th>head1</th><th>head2</th></thead>
[*]test1[|]test2
[*]test1[|]test2
[/table]

It should produce the following:

[table]<thead>
<th>head1</th><th>head2</th></thead>
[*]test1</td><td>test2
[*]test1</td><td>test2
[/table]

but it actualy procudes this:

[table]<thead>
<th>head1</th><th>head2</th></thead>
[*]test1[|]test2
[*]test1</td><td>test2
[/table]

The problem with that is, that [|] is used in other markup codes to but should not be replaced with </td><td>


To clarify: I have a table "bb-code"

[table]
[**]header1[||]header2[||]header3[||]...[/**]
[*]child1.1[|]child1.2[|]child1.3[|]...
[*]child2.1[|]child2.2[|]child2.3[|]...
[*]child3.1[|]child3.2[|]child3.3[|]...
[*]...[|]...[|]...[|]...
[/table]

I want this to become this:

<table class="ui compact stripet yellow table">
    <thead>
        <tr>
            <th>header1</th>
            <th>header2</th>
            <th>header3</th>
            <th>....</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>child1.1</td>
            <td>child1.2</td>
            <td>child1.3</td>
            <td>...</td>
        </tr>
        <tr>
            <td>child2.1</td>
            <td>child2.2</td>
            <td>child2.3</td>
            <td>...</td>
        </tr>
        <tr>
            <td>child3.1</td>
            <td>child3.2</td>
            <td>child3.3</td>
            <td>...</td>
        </tr>
    </tbody>
</table>

Solution

  • Okay, I had a few minutes to spare on my mobile phone before bedtime, so I ran with Wiktor's comment and whacked up a series of preg_ functions to try to convert your bbcode to html. I don't have any experience with bbcode, so I am purely addressing your sample input and not considering fringe cases. I think php has a bbcode parser library somewhere, but I don't know if your bbcode syntax is the standard.

    Some break down of the patterns implemented.

    First, isolate each whole [table]...[/table] string in the document. (Regex101 Demo) ~\[table]\R*([^[]*(?:\[(?!/?table])[^[]*)*)\R*\[/table]~ will match the strings and pass the fullmatch as $m[0] and the substring between the table tags as $m[1] to BBTableToHTML().

    Next, BBTableToHTML() will make 3 separate passes over the $m[1] string. Each of those patterns will send their respective matched strings to the associated custom function and return the modified string.

    Before sending the updated $m[1] from BBTableToHTML() back to the echo, your desired <table...> and </table> tags will bookend $m[1].

    Demos of the preg_replace_callback_array() patterns:

    1. ~\[\*\*]([^[]*(?:\[(?!/?\*\*])[^[]*)*)\[/\*\*]~ https://regex101.com/r/thINHQ/2
    2. ~(?:\[\*].*\R*)+~ https://regex101.com/r/thINHQ/3
    3. ~\[\*](.*)~ https://regex101.com/r/thINHQ/4

    Code: (Demo)

    $bbcode = <<<BBCODE
    [b]Check out this demo[/b]
    ¯\_(ツ)_/¯
    [table]
    [**]header1[||]header2[||]header3[||]...[/**]
    [*]child1.1[|]child1.2[|]child1.3[|]...
    [*]child2.1[|]child2.2[|]child2.3[|]...
    [*]child3.1[|]child3.2[|]child3.3[|]...
    [*]...[|]...[|]...[|]...
    [/table]
    simple text
    [table]
    [**]a 1[||]and a 2[/**]
    [*]A[|]B
    [*]C[|]D
    [/table]
    
    [s]3, you're out[/s]
    blah
    BBCODE;
    
    function BBTableToHTML($m) {
        return "<table class=\"ui compact stripet yellow table\">\n" .
               preg_replace_callback_array(
                   [
                       '~\[\*\*]([^[]*(?:\[(?!/?\*\*])[^[]*)*)\[/\*\*]~' => 'BBTHeadToHTML',
                       '~(?:\[\*].*\R*)+~' => 'BBTBodyToHTML',
                       '~\[\*](.*)~' => 'BBTBodyRowToHTML'
                   ],
                   $m[1]
               ) .
               "</table>";
    }
    
    function BBTHeadToHTML($m) {
        return "\t<thead>\n" .
               "\t\t<tr>\n\t\t\t<th>" . str_replace('[||]', "</th>\n\t\t\t<th>", $m[1]) . "</th>\n\t\t</tr>\n" .
               "\t</thead>";
    }
    
    function BBTBodyToHTML($m) {
        return "\t<tbody>\n{$m[0]}\t</tbody>\n";
    }
    
    function BBTBodyRowToHTML($m) {
        return "\t\t<tr>\n\t\t\t<td>" . str_replace('[|]', "</td>\n\t\t\t<td>", $m[1]) . "</td>\n\t\t</tr>";
    }
    
    echo preg_replace_callback(
             '~\[table]\R*([^[]*(?:\[(?!/?table])[^[]*)*)\R*\[/table]~',
             'BBTableToHTML',
             $bbcode
         );
    

    Output:

    [b]Check out this demo[/b]
    ¯\_(ツ)_/¯
    <table class="ui compact stripet yellow table">
        <thead>
            <tr>
                <th>header1</th>
                <th>header2</th>
                <th>header3</th>
                <th>...</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>child1.1</td>
                <td>child1.2</td>
                <td>child1.3</td>
                <td>...</td>
            </tr>
            <tr>
                <td>child2.1</td>
                <td>child2.2</td>
                <td>child2.3</td>
                <td>...</td>
            </tr>
            <tr>
                <td>child3.1</td>
                <td>child3.2</td>
                <td>child3.3</td>
                <td>...</td>
            </tr>
            <tr>
                <td>...</td>
                <td>...</td>
                <td>...</td>
                <td>...</td>
            </tr>
        </tbody>
    </table>
    simple text
    <table class="ui compact stripet yellow table">
        <thead>
            <tr>
                <th>a 1</th>
                <th>and a 2</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>A</td>
                <td>B</td>
            </tr>
            <tr>
                <td>C</td>
                <td>D</td>
            </tr>
        </tbody>
    </table>
    
    [s]3, you're out[/s]
    blah