I have a function that does some arithmetic stuff (quantile normalization) over either a numpy array or pandas dataframe. When you put in a ndarray, you should get back a ndarray, and when you put in a pandas dataframe you should get back a dataframe:
from typing import Union, overload
@overload
def quantile_normalize(data: pd.DataFrame) -> pd.DataFrame: ...
@overload
def quantile_normalize(data: np.ndarray) -> np.ndarray: ...
def quantile_normalize(data: Union[pd.DataFrame, np.ndarray]) -> Union[pd.DataFrame, np.ndarray]:
pass
However when I try to test this with mypy it complains:
qnorm/quantile_normalize.py:72: error: Overloaded function signature 2 will never be matched: signature 1's parameter type(s) are the same or broader
The problems and answers so far I have found related to this issue all seem to be related to optional input/output and None types. A pandas dataframe and a numpy array are related, however, they should be distinguishible by mypy.
When a library is missing type hints, every import will resolve to Any
. Both numpy
and pandas
aren't PEP 526 conform (not offering any type hints) and have no stubs in typeshed
, so both pd.DataFrame
and np.ndarray
will resolve to Any
, thus both overloads resolve to def quantile_normalize(data: Any) -> Any: ...
. To fix the issue, add stubs for numpy
and pandas
.
Either use existing type stubs - I use data-science-types
(PyPI, GitHub) which offer stubs for numpy
, pandas
and matplotlib
:
$ pip install data-science-types
Now pd.DataFrame
and np.ndarray
will be correctly resolved when running mypy
. This will also provide you better code completions in every IDE that supports PEP 526 (e.g. Visual Studio Code or WingIDE) for free.
Or, if you can't/don't want to add the stub package, write your own minimal stubs, e.g.
# _typeshed/pandas/__init__.pyi
from typing import Any
def __getattr__(name: str) -> Any: ... # incomplete
class DataFrame:
def __getattr__(self, name: str) -> Any: ... # incomplete
and
# _typeshed/numpy/__init__.pyi
from typing import Any
def __getattr__(name: str) -> Any: ... # incomplete
class ndarray:
def __getattr__(self, name: str) -> Any: ... # incomplete
and run MYPYPATH=_typeshed mypy ...
.