Search code examples
pythonregexmypy

How to properly annote Module Re expressions?


I've been trying to type the return value of re.compile but mypy always complains even though I used the exact return type suggested by VS Code ie re.Pattern[re.AnyStr@compile]

Signature of re.compile :

    (function) def compile(
        pattern: AnyStr@compile,
        flags: _FlagsType = 0
    ) -> Pattern[AnyStr@compile]

My code:

import re
# from typing import Pattern

def dummyF(fp: str)-> dict[str, int]:
    # ...
    daterx : re.Pattern[re.AnyStr@compile] = re.compile(r""" ...
    """, re.VERBOSE)
    # ...

And mypy complaints

$pdm run startmypy
src/regex-log-filtering.py:20: error: Invalid type comment or annotation  [valid-type]
Found 1 error in 1 file (checked 8 source files)

So what is the proper way to annotate this example ?

I tried

  • daterx : re.Pattern[re.AnyStr] = re.compile(r"""
  • and daterx : re.Pattern[typing.AnyStr] = re.compile(r"""

The later example left me really dumb struck as the error did even make less sense to me.

$pdm run startmypy
src/regex-log-filtering.py:21: error: Type variable "typing.AnyStr" is unbound  [valid-type]
src/regex-log-filtering.py:21: note: (Hint: Use "Generic[AnyStr]" or "Protocol[AnyStr]" base class to bind "AnyStr" inside a class)
src/regex-log-filtering.py:21: note: (Hint: Use "AnyStr" in function signature to bind "AnyStr" inside a function)

Solution

  • I don't know exactly what VSCode is doing, but AnyStr@compile is a syntax error; this seems to be some sort of internal representation that is being leaked as a valid type hint.

    AnyStr is a type variable, not a type, per the documentation:

    AnyStr is a constrained type variable defined as AnyStr = TypeVar('AnyStr', str, bytes).

    This ensures two things in regards to re.compile.

    1. The pattern argument can only be a str or a bytes values (or an instance of a subclass of either).
    2. Whatever type you pass for the pattern fixes the return type to a corresponding Pattern. If you provide a str pattern, you get back a Pattern[str] value. If you provide a bytes value, you get back a Pattern[bytes] value.

    For you assignment, you are passing a str value to re.compile, so you should expect a Pattern[str] value as a result.

    daterx : re.Pattern[str] = re.compile(r""" ...""", re.VERBOSE)