Search code examples
pythonpandasnumpypython-typingmypy

mypy overload function with numpy ndarray and pandas dataframe (signature parameter type(s) are the same or broader)


I have a function that does some arithmetic stuff (quantile normalization) over either a numpy array or pandas dataframe. When you put in a ndarray, you should get back a ndarray, and when you put in a pandas dataframe you should get back a dataframe:

from typing import Union, overload


@overload
def quantile_normalize(data: pd.DataFrame) -> pd.DataFrame: ...
@overload
def quantile_normalize(data: np.ndarray) -> np.ndarray: ...


def quantile_normalize(data: Union[pd.DataFrame, np.ndarray]) -> Union[pd.DataFrame, np.ndarray]:
    pass

However when I try to test this with mypy it complains:

qnorm/quantile_normalize.py:72: error: Overloaded function signature 2 will never be matched: signature 1's parameter type(s) are the same or broader

The problems and answers so far I have found related to this issue all seem to be related to optional input/output and None types. A pandas dataframe and a numpy array are related, however, they should be distinguishible by mypy.


Solution

  • When a library is missing type hints, every import will resolve to Any. Both numpy and pandas aren't PEP 526 conform (not offering any type hints) and have no stubs in typeshed, so both pd.DataFrame and np.ndarray will resolve to Any, thus both overloads resolve to def quantile_normalize(data: Any) -> Any: .... To fix the issue, add stubs for numpy and pandas.

    Either use existing type stubs - I use data-science-types (PyPI, GitHub) which offer stubs for numpy, pandas and matplotlib:

    $ pip install data-science-types
    

    Now pd.DataFrame and np.ndarray will be correctly resolved when running mypy. This will also provide you better code completions in every IDE that supports PEP 526 (e.g. Visual Studio Code or WingIDE) for free.

    Or, if you can't/don't want to add the stub package, write your own minimal stubs, e.g.

    # _typeshed/pandas/__init__.pyi
    
    from typing import Any
    
    
    def __getattr__(name: str) -> Any: ...  # incomplete
    
    class DataFrame:
        def __getattr__(self, name: str) -> Any: ...  # incomplete
    

    and

    # _typeshed/numpy/__init__.pyi
    
    from typing import Any
    
    
    def __getattr__(name: str) -> Any: ...  # incomplete
    
    class ndarray:
        def __getattr__(self, name: str) -> Any: ...  # incomplete
    

    and run MYPYPATH=_typeshed mypy ....