Write module index to file with pydoc without server

I currently write documentation for my Python library using the following command:

python -m pydoc -w "\\myserver.com\my_library"

This works fine, and I find in my_library HTML files with documentation derived from class / method / function docstrings. This even documents Python files found in subfolders.

I would now like to create and save an index which gives access to all these files.

The pydoc documentation says this is possible if you start a server:

pydoc -b will start the server and additionally open a web browser to a module index page. Each served page has a navigation bar at the top where you can Get help on an individual item, Search all modules with a keyword in their synopsis line, and go to the Module index, Topics and Keywords pages.

However, I'm looking to write a Module Index page, including relative links to single file documentation, without the server solution. I can then store the Index + single files [one for each py file] in a directory accessible to users.

Is this possible, or is there a better way to approach this problem?

I have looked at Sphinx but this seems overkill for my requirements.

Solution

This can be basically achieved by running a small script:

that imports the to be documented modules,
writes the documentation to html files and then
writes the output of the internal function that dynamically generates the index.html to a file index.html.

It is not a super nice solution because it relies on internals of the pydoc module, but is reasonably compact:

import pydoc
import importlib

module_list = ['sys']
for m in module_list:
    importlib.import_module(m)
    pydoc.writedoc(m)

#the monkey patching optionally goes here

with open('index.html','w') as out_file:
    out_file.write(pydoc._url_handler('index.html'))

There is another flaw with this is, in that it also creates links to all the builtin modules, etc. for which we did not (and did not want to, I guess) generate documentation for.

Can we copy the function from pydoc.py that creates the index.html file and modify it to only add links for our desired modules? Unfortunately, this is not straight forward, because the function uses some non-local scope to achieve some of its logic.

The next best solution would be to monkey-patch the index_html() method that generates this page to only list our modules.

Unfortunately pydoc._url_handler uses a local function to implement this and not a class method. So it becomes a bit tricky from here.

There is a solution to monkey patch, but it is a bit of a hack:

before calling the _url_handler we need to:

define a patched version that only generates links for elements in our module_list (the detour via __placeholder__ is used because our module_list is not defined in the scope where the function runs, so we need to do something that corresponds to hard-coding into the function, sort of.)
patch the source of the pydoc module to use that local function instead of the originally defined

This is achieved by the following:

import inspect, ast

__placeholder__ = None

#our patched version, needs to have same name and signature as original
def html_index():
    """Module Index page."""
    names= __placeholder__

    def bltinlink(name):
        return '<a href="%s.html">%s</a>' % (name, name)

    heading = html.heading(
        '<big><big><strong>Index of Modules</strong></big></big>',
        '#ffffff', '#7799ee')
    contents = html.multicolumn(names, bltinlink)
    contents = [heading, '<p>' + html.bigsection(
        'Module List', '#ffffff', '#ee77aa', contents)]

    contents.append(
        '<p align=right><font color="#909090" face="helvetica,'
        'arial"><strong>pydoc</strong> by Ka-Ping Yee'
        '&lt;[email protected]&gt;</font>')
    return 'Index of Modules', ''.join(contents)

#get source and replace __placeholder__ with our module_list
s=inspect.getsource(html_index).replace('__placeholder__', str(module_list))

#create abstract syntax tree, and store the actual function definition in l_index
l_index=ast.parse(s).body[0]
#ast.dump(l_index) #check if you want

#now obtain source from unpatched pydoc, generate ast patch it and recompile:
s= inspect.getsource(pydoc)
m = ast.parse(s)

def find_named_el_ind(body, name):
    '''find named element in ast body'''
    found=False
    for i,e in enumerate(body):
        if hasattr(e,'name') and e.name == name:
            found=True
            break
    if not found: raise ValueError('not found!')
    return i

#find and replace html_index with our patched html_index
i_url_handler = find_named_el_ind(m.body, '_url_handler')
i_html_index = find_named_el_ind(m.body[i_url_handler].body, 'html_index')
m.body[i_url_handler].body[i_html_index] = l_index

#compile and replace module in memory
co = compile(m, '<string>', 'exec')
exec(co, pydoc.__dict__)

#ast.dump(m.body[i_url_handler]) #check ast if you will