Search code examples
phptableofcontents

Creating a table of contents in php


I am looking to create a very simple, very basic nested table of contents in php which gets all the h1-6 and indents things appropriately. This means that if I have something like:

<h1>content</h1>
<h2>more content</h2>

I should get:

content
    more content.

I know it will be css that creates the indents, that's fine, but how do I create a table of contents with working links to the content on the page?

apparently its hard to grasp what I am asking for...

I am asking for a function that reads an html document and pulls out all the h1-6 and makes a table of contents.


Solution

  • For this you have just to search for the tags in the HTML code.

    I wrote two functions (PHP 5.4.x).

    The first one returns an array, that contains the data of the table of contents. The data is is only the headline it self, the id of the tag (if you want to use anchors) and a sub-table of content.

    function get_headlines($html, $depth = 1)
    {
        if($depth > 7)
            return [];
    
        $headlines = explode('<h' . $depth, $html);
    
        unset($headlines[0]);       // contains only text before the first headline
    
        if(count($headlines) == 0)
            return [];
    
        $toc = [];      // will contain the (sub-) toc
    
        foreach($headlines as $headline)
        {
            list($hl_info, $temp) = explode('>', $headline, 2);
            // $hl_info contains attributes of <hi ... > like the id.
            list($hl_text, $sub_content) = explode('</h' . $depth . '>', $temp, 2);
            // $hl contains the headline
            // $sub_content contains maybe other <hi>-tags
            $id = '';
            if(strlen($hl_info) > 0 && ($id_tag_pos = stripos($hl_info,'id')) !== false)
            {
                $id_start_pos = stripos($hl_info, '"', $id_tag_pos);
                $id_end_pos = stripos($hl_info, '"', $id_start_pos);
                $id = substr($hl_info, $id_start_pos, $id_end_pos-$id_start_pos);
            }
    
            $toc[] = [  'id' => $id,
                        'text' => $hl_text,
                        'sub_toc' => get_headlines($sub_content, $depth + 1)
                    ];
    
        }
    
        return $toc;
    }
    

    The second returns a string that formats the toc with HTML.

    function print_toc($toc, $link_to_htmlpage = '', $depth = 1)
    {
        if(count($toc) == 0)
            return '';
    
        $toc_str = '';
    
        if($depth == 1)
            $toc_str .= '<h1>Table of Content</h1>';
    
        foreach($toc as $headline)
        {
            $toc_str .= '<p class="headline' . $depth . '">';
            if($headline['id'] != '')
                $toc_str .= '<a href="' . $link_to_htmlpage . '#' . $headline['id'] . '">';
    
            $toc_str .= $headline['text'];
            $toc_str .= ($headline['id'] != '') ? '</a>' : '';
            $toc_str .= '</p>';
    
            $toc_str .= print_toc($headline['sub_toc'], $link_to_htmlpage, $depth+1);
        }
    
        return $toc_str;
    }
    

    Both functions are far away from being perfect, but they work fine in my tests. Feel free to improve them.

    Notice: get_headlines is not a parser, so it does not work on broken HTML code and just crashes. It also only works with lowercase <hi>-tags.