Search code examples
pythonmypypython-typing

How to type hint dictionaries that may have different custom keys and/or values?


In the previous versions of our app, people would just pass some arguments with plain strings to certain functions, as we did not have specific type hinting or data types for some of them. Something like:

# Hidden function signature:
def dummy(var: str):
    pass

# Users:
dummy("cat")

But now we want to implement custom data types for those function signatures, while providing backward compatibility. Say something like this:

# Signature:
def dummy(var: Union[NewDataType, Literal["cat"]])

# Backward compatibility:
dummy("cat")

# New feature:
dummy(NewDataType.cat)

Achieving this for simple function signatures is fine, but the problem comes when the signatures are more complex.

How to implement this if the argument of dummy is a dictionary that can take both Literal["cat"] and NewDataType as keys? Furthermore, how to achieve this if the argument is a dictionary with the same previous key type combination, but that could also have str and int as values (and the four possible combinations)? All of this must be compliant with mypy, pylint and use Python 3.9 (no StrEnum or TypeAlias).

I have tried many different combinations like the following:

from typing import TypedDict, Literal, Dict, Union
from enum import Enum

# For old support:
AnimalsLiteral = Literal[
    "cat",
    "dog",
    "snake",
]

# New datatypes:
class Animals(Enum):
    cat = "cat"
    dog = "dog"
    snake = "snake"

# Union of Animals Enum and Literal types for full support:
DataType = Union[Animals, AnimalsLiteral]

# option 1, which fails:
def dummy(a: Dict[DataType, str]):
    pass

# option 2, which also fails:
# def dummy(a: Union[Dict[DataType, str], Dict[Animals, str], Dict[AnimalsLiteral, str]]):
#    pass

if __name__ == "__main__":
    # Dictionary with keys as Animals Enum
    input_data1 = {
        Animals.dog: "dog",
    }
    dummy(input_data1)

    # Dictionary with keys as Literal["cat", "dog", "snake"]
    input_data2 = {
        "dog": "dog",
    }
    dummy(input_data2)

    # Dictionary with mixed keys: Animals Enum and Literal string
    input_data3 = {
        Animals.dog: "dog",
        "dog": "dog",
    }
    dummy(input_data3)

dummy(input_data1) is fine, but dummy(input_data2) gives the following mypy errors with signature 2 for dummy:

Argument 1 to "dummy" has incompatible type "dict[str, str]"; expected "Union[dict[Union[Animals, Literal['cat', 'dog', 'snake']], str], dict[Animals, str], dict[Literal['cat', 'dog', 'snake'], str]]"Mypyarg-type
Argument 1 to "dummy" has incompatible type "dict[str, str]"; expected "Union[dict[Union[Animals, Literal['cat', 'dog', 'snake']], str], dict[Animals, str], dict[Literal['cat', 'dog', 'snake'], str]]"Mypyarg-type
(variable) input_data2: dict[str, str]

Of course doing something like:

input_data2: DataTypes = {
    "dog": "dog",
}

would solve it, but I can't ask the users to always do that when they create their datatypes.

Also, I have tried another alternative using TypedDict, but I still run into the same type of mypy errors.

In the end, I want to be able to create mypy and pylint compliant typehints of dictionaries which may take custom key types (as in the example) and even custom value types, or combination of the above.


Solution

  • The core issue is this:

    Of course doing something like:

    input_data2: DataTypes = {
        "dog": "dog",
    }
    

    would solve it, but I can't ask the users to always do that when they create their datatypes.

    If you don't want your users to provide annotations, then they will have to pass the data directly to the function (dummy({"dog": "dog"})) for the function's parameter type inference to kick in. This is because when a type-checker infers the type of an unannotated name in an assignment from a dict, they don't infer the type as literal (see mypy Playground, Pyright Playground):

    a = {"dog": "dog"}
    reveal_type(a)  # dict[str, str]
    

    I suspect that if the type-checkers tried to infer literal keys on unannotated assignments, other users would complain of false positives because they'd want dict[str, str]. dict[str, str] can never fulfil a more tightly-annotated parameter in your functions (def dummy(a: Dict[DataType, str]): ...).


    In my opinion, you have 2 choices:

    1. Fulfil stricter typing by asking your users to annotate (it isn't clear from the question who is providing the DataType definitions - is it you/library-maintainers or the users)?

    2. Don't ask your users to annotate, but make a @typing.overload which allows looser annotations:

      from typing import overload
      
      @overload
      def dummy(a: dict[DataType, str]): ...
      @overload
      def dummy(a: dict[str, str]): ...
      

      As a bonus, when mypy gains support, you can use PEP 702: @warnings.deprecated to warn your users if their typing is too loose. See an example at Pyright Playground.


    An additional note: In your question details, you mentioned:

    All of this must be compliant with mypy, pylint and use Python 3.9 (no StrEnum or TypeAlias).

    Python versions which aren't end-of-life are capable of utilising most newer features from typing. This is because type checkers are required to understand imports from typing_extensions, regardless of whether this module exists at runtime. So, TypeAlias and the union syntax int | str are available in Python 3.9 via the following, as long as you don't need to introspect annotations at runtime:

    from __future__ import annotations
    
    var1: int | str = 1
    
    from typing import TYPE_CHECKING
    
    if TYPE_CHECKING:
        from typing_extensions import TypeAlias
        
        IntStrAlias: TypeAlias = int | str
    
    # Or
    
    IntStrAlias: TypeAlias = "int | str"
    

    Python 3.11's enum.StrEnum is also easily imitated in Python 3.9 (see the note in the docs), and type-checkers are required to understand this:

    from enum import Enum
    
    class StrEnum(str, Enum):
        dog = "dog"
    
    >>> reveal_type(StrEnum.dog)  # Literal[StrEnum.dog]
    >>> print(StrEnum.dog + " barks!")
    dog barks!