Search code examples
phphtmlparsingdomdomdocument

Get menu array from HTML string using PHP DOM document


I have the following code:

$string = '<html><head></head><body><ul id="mainmenu">
  <li id="1"><a href="1"> main menu 1 </a> </li>
  <li id="2"> <a href="2"> main menu 2 </a> </li>
    <ul class="sub-menu">
      <li id="3"> <a href="3"> Sub menu 2 </a> </li>
      <li id="4"> <a href="4"> Sub menu 2.1 </a> </li>
    </ul>
  </li>
</ul></body></html>';
$dom = new DOMDocument;
$dom->loadHTML($string);

Now I want an array as output containing href, value and submenu fields with respective values using PHP DOM document.

Something like this:

Array
(
    [0] => Array
        (
            [href] => 1
            [name] => Main menu 1
            [sub] => Array
                (
                )

        )

    [1] => Array
        (
            [href] => 2
            [name] => main menu 2
            [sub] => Array
                (
                   [0] => Array
                    (
                       [href] => 3
                       [name] => sub menu 2
                       [sub] => Array
                             (
                              )

                    )

                   [1] => Array
                       (
                         [href] => 4
                          [name] => sub main menu 2.1
                         [sub] => Array
                             (

                             )

                   )
                )

        )
)

I am able to get all the menu items as main menu and all submenu array as empty. How can I achieve this by parsing HTML string?


Solution

  • Assuming that you just have the two levels, this code uses XPath to find the start of each menu and then loops through the <li> elements. It does a similar thing for the sub menus, using the current main menu as the start point and only the contents (using descendant:: to limit the nodes searched)....

    (I've had to alter the HTML as there was an extra <li> in <li id="2"> <a href="2"> main menu 2 </a> </li>)

    $string = '<html><head></head><body><ul id="mainmenu">
      <li id="1"><a href="1"> main menu 1 </a> </li>
      <li id="2"> <a href="2"> main menu 2 </a>
        <ul class="sub-menu">
          <li id="3"> <a href="3"> Sub menu 2 </a> </li>
          <li id="4"> <a href="4"> Sub menu 2.1 </a> </li>
        </ul>
      </li>
    </ul></body></html>';
    $dom = new DOMDocument;
    $dom->loadHTML($string);
    $xp = new DOMXPath($dom);
    $menus = [];
    
    $mainMenus = $xp->query('//ul[@id="mainmenu"]/li');
    foreach ( $mainMenus as $menu )  {
        $a = $menu->getElementsByTagName("a")[0];
        $newMenu = [ "href" => $a->getAttribute("href"),
            "name" => $a->textContent
        ];
    
        $subMenus = $xp->query('descendant::ul[@class="sub-menu"]/li', $menu);
        foreach ( $subMenus as $menu1 )  {
            $a = $menu1->getElementsByTagName("a")[0];
    
            $newMenu['sub'][] = [ "href" => $a->getAttribute("href"),
                "name" => $a->textContent
            ];
        }
        $menus[] = $newMenu;
    }
    

    If you have a list of possible ID's, then you could use the XPath to find any of them..

    //ul[@id="mainmenu" or @id="menu-main" or @id="menu-menu1"]/li
    

    You could build this dynamically from an array if you need to...

    $menu_ids_arr = array('mainmenu', 'menu-main', 'menu-menu1');
    $query = '//ul[';
    foreach ( $menu_ids_arr as $id )    {
        $query .= '@id="'.$id.'" or ';
    }
    $query = substr($query, 0, -4).']/li';
    $mainMenus = $xp->query($query);