Search code examples
phpregexpreg-match-all

php - how to get meta tags from url


I want to get meta tags from url. If there is a data attribute value, it cannot be extracted properly. How do I change the regular expression?

HTML Code

1. <meta property="og:title" content="111">
2. <meta data-one="true" property="og:description" content="222">
3. <meta data-two="true" property="og:image" content="333">
4. <meta data-three="true" data-another="true" property="og:url" content="444">

PHP Code

preg_match_all('~<\s*meta\s*property="(og:[^"]+)"\s*content="([^"]*)~i', $html, $matches);

Result

Array(
  [0] => og:title
)

Hope Result

Array(
  [0] => og:title,
  [1] => og:description,
  [2] => og:image,
  [3] => og:url
)

Solution

  • The problem is with the second and third \s* which says to match zero or more spaces. However, in the second case you want to match \b.*\b, word boundary (end of meta), then anything, then a new word boundary (start of property). For the third case, \s.*\b is needed as " is not a word boundary, so your fixed regex is:

    preg_match_all('~<\s*meta\b.*\bproperty="(og:[^"]+)"\s.*\bcontent="([^"]*)~i', $html, $matches);
    

    See the example here.