Search code examples
pythondocstring

Extract multiple variable names in docstrings


I'm trying to find all the variable names in docstrings from Python. For instance, the form of the docstring is following:

Scans through a string for substrings matched some patterns (first-subgroups only).

Args:
    text: A string to be scanned.
    patterns: Arbitrary number of regex patterns.

Returns:
    When only one pattern is given, returns a string (None if no match found).
    When more than one pattern are given, returns a list of strings ([] if no match found).

I would like to extract both text and patterns with regex.

I tried this code to find all element after break lines which are ending with : thanks to this particular regular expression:

string = """Args:
    text: A string to be scanned.
    patterns: Arbitrary number of regex patterns."""
print(re.findall('Args:[\r\n]+(.+?):', string))

But this regular expression captures nothing, what am I doing wrong?


Solution

  • I would use docstring-parser rather than re-inventing the wheel. It supports Google, ReST, and Numpydoc style docstrings.

    from docstring_parser import parse
    
    s = """
    Scans through a string for substrings matched some patterns (first-subgroups only).
    
    Args:
        text: A string to be scanned.
        patterns: Arbitrary number of regex patterns.
    
    Returns:
        When only one pattern is given, returns a string (None if no match found).
        When more than one pattern are given, returns a list of strings ([] if no match found).
    """
    doc_str = parse(s)
    print([param.arg_name for param in doc_str.params])
    

    Output

    ['text', 'patterns']