Search code examples
phphtmlquotes

Invalid HTML - Quoting Attributes


I have following HTML:

<td width=140 style='width:105.0pt;padding:0cm 0cm 0cm 0cm'>
    <p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";
       mso-fareast-font-family:"Times New Roman";color:#666666'>OCCUPANCY
       TAX:</span></p>
</td>

Some of the HTML attributes are not quoted, like for example: width=140 and class=MsoNormal

Are there any PHP function for that sort of thing, if not what would be the clever way of sanitizing this in HTML?

Thank you.


Solution

  • I guess you could use regexp for this:

    /\s([\w]{1,}=)((?!")[\w]{1,}(?!"))/g
    
    
    \s match any white space character [\r\n\t\f ]
    1st Capturing group ([\w]{1,}=)
        [\w]{1,} match a single character present in the list below
            Quantifier: {1,} Between 1 and unlimited times, as many times as possible, giving back as needed [greedy]
        \w match any word character [a-zA-Z0-9_]
        = matches the character = literally
    2nd Capturing group ((?!")[\w]{1,}(?!"))
        (?!") Negative Lookahead - Assert that it is impossible to match the regex below
        " matches the characters " literally
        [\w]{1,} match a single character present in the list below
            Quantifier: {1,} Between 1 and unlimited times, as many times as possible, giving back as needed [greedy]
        \w match any word character [a-zA-Z0-9_]
        (?!") Negative Lookahead - Assert that it is impossible to match the regex below
        " matches the characters " literally
    g modifier: global. All matches (don't return on first match)
    

    Which would be implemented something like this:

    echo preg_replace_callback('/\s([\w]{1,}=)((?!")[\w]{1,}(?!"))/', function($matches){
        return ' '.$matches[1].'"'.$matches[2].'"';
    }, $str);
    

    And would result in:

     <td width="140" style='width:105.0pt;padding:0cm 0cm 0cm 0cm'>
       <p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Arial","sans-serif";
         mso-fareast-font-family:"Times New Roman";color:#666666'>OCCUPANCY
          TAX:</span></p>
     </td>
    

    Eval.in live example

    Note, this is a down and dirty example, and can surely be cleaned up.