Search code examples
regexgulp

How to exclude "<? ... ?>" fragments in php code (gulp-htmlmin)?


To ignore php code in gulp-htmlmin I use
ignoreCustomFragments: [/<\?(php|=)[\s\S]*?(?:\?>|$)/]

But in my php code there is a string which contains the following fragment - "<?xml ... ?>". This string gives a mistake of gulp-htmlmin. I think it is because the string has "?>" which "closes" php code before its actual end.
How to set regex of ignoreCustomFragments to avoid the mistake?


Solution

  • /
    <\?(php|=)
    (?:
      # match any symbol except ? or ' or "
      [^?'"]
      # or ? not followed by >
      | \?(?!>)
      # or match '...' substring
      | '(?:
            # match any symbol except '
            [^']
            # or ' preceded by \
            | (?<=\\)'
         )*'
      # or match "..." substring
      | "(?: [^"] | (?<=\\)" )*"
    )*
    (?:\?>|$)
    /gmx
    

    Update

    For handling comments:

    <\?(php|=)
    (?:
      # match any symbol except ? or ' or " or /
      [^?'"\/]
      # or ? not followed by >
      | \? (?!>)
      # or / not followed by [*/]
      | \/ (?![*\/])
      # or match comment //
      | \/\/ [^\r\n]+
      # or match comment /* */
      | \/\* (?: [^*] | \*(?!\/) )+ \*\/
      # or match '...' substring
      | '(?:
            # match any symbol except '
            [^']
            # or ' preceded by \
            | (?<=\\)'
         )*'
      # or match "..." substring
      | "(?: [^"] | (?<=\\)" )*"
    )*
    (?:\?>|$)