Search code examples
phphtmlregexpreg-replace-callback

Regular expression for wraping all tr contains th tags in thead


I have a problem with regex, I need to wrap all the tr which contains th and put it in a thead. I have a variable $html which contains a html table like this:

$html ="
<table>
<tr>
  <th>header1</th> 
  <th>header2</th>
  <th>header3</th>
</tr>
<tr>
  <th>header21</th> 
  <th>header22</th>
  <th>header23</th>
</tr>

<tr>
  <td>body1</td> 
  <td>body2</td>
  <td>body3</td>
</tr>
<tr>
  <td>body21</td> 
  <td>body22</td>
  <td>body23</td>
</tr>
</table>";

The regex i wrote is this

$html = preg_replace_callback(
'#(<tr.*?<th>.*?<th>.*?<\/tr>)#s', 
 function($match) {
        return '<thead>' . $match[0] . '</thead>';
    },
 $html);

But the result I get is different for what I want. Now, I get tr into a different thead.


Solution

  • It's not a good idea to try to parse HTML with regular expressions.

    That said, you need to get rid of one question mark, which gives you unlimited but as few as possible. For the space between the first and last <th> you want it to be as many as possible. This will to the trick:

                  #this is supposed to be as greedy as possible
                  #
    ~(<tr.*?<th>.*<th>.*?</tr>)~s
    

    See https://regex101.com/r/fR1xB5/1