The documentation for np.typing.NDArray
says that it is "a generic version of np.ndarray[Any, np.dtype[+ScalarType]]
". Where is the generalization in "generic" happening?
And in the documentation for numpy.ndarray.__class_getitem__
we have this example np.ndarray[Any, np.dtype[Any]]
with no explanation as to what the two arguments are.
And why can I do np.ndarray[float]
, ie just use one argument? What does that mean?
Note from the future: as of NumPy 2.0 the docs is more explicit to say
A
np.ndarray[Any, np.dtype[+ScalarType]]
type alias generic w.r.t. itsdtype.type
.
and as of 2.2 (dev docs currently) the type alias is changed to NDArray = np.ndarray[tuple[int, ...], np.dtype[+ScalarType]]
. This now makes it clearer what the type alias represents, and what I concluded in my original answer below.
"Generic" in this context means "generic type" (see also the Glossary), typing-related objects that can be subscripted to generate more specific type "instances" (apologies for the sloppy jargon, I'm not well-versed in typing talk). Think typing.List
that lets you use List[int]
to denote a homogeneous list of ints.
As of Python 3.9 most standard-library collections have been upgraded to be compatible with typing as generic types themselves. Since tuple[foo]
used to be invalid until 3.9, it was safe to allow tuple[int, int]
to mean the same thing that typing.Tuple[int, int]
used to mean: a tuple of two integers.
So as of 3.9 NumPy also allows using the np.ndarray
type as a generic, this is what np.ndarray[Any, np.dtype[Any]]
does. This "signature" matches the actual signature of np.ndarray.__init__()
(__new__()
if we want to be correct):
class numpy.ndarray(shape, dtype=float, ...)
So what np.ndarray[foo, bar]
does is create a type for type hinting that means "a NumPy array of shape type foo
and dtype bar
". People normally don't call np.ndarray()
directly anyway (rather using helpers such as np.array()
or np.full_like()
and the like), so this is doubly fine in NumPy.
Now, since most code runs with arrays of more than one possible number of dimensions, it would be a pain to have to specify an arbitrary number of lengths for the shape tuple (the first "argument" of np.ndarray
as a generic type). I assume this was the motivation to define a type alias that is still a generic in the second "argument". This is np.typing.NDArray
.
It lets you easily type hint something as an array of a given type without having to say anything about the shape, covering a vast subset of use cases (which would otherwise use np.ndarray[typing.Any, ...]
). And this is still a generic, since you can parameterise it with a dtype. To quote the docs:
>>> print(npt.NDArray)
numpy.ndarray[typing.Any, numpy.dtype[+ScalarType]]
>>> print(npt.NDArray[np.float64])
numpy.ndarray[typing.Any, numpy.dtype[numpy.float64]]
As usual with generics, you're allowed to specify an argument to the generic type, but you're not required to. ScalarType
is derived from np.generic
, a base class that covers most (maybe all) NumPy scalar types. And the library code that defines NDArray
is here, and is fairly transparent to the point of calling the helper _GenericAlias
for older Python (a backport of typing.GenericAlias
). What you have at the end is a type alias that is still generic in one variable.
To address the last part of your question:
And why can I do
np.ndarray[float]
, ie just use one argument? What does that mean?
I think the anticlimactic explanation is that we again need to look at the signature of np.ndarray()
:
class numpy.ndarray(shape, dtype=float, buffer=None, offset=0, strides=None, order=None)
There's one mandatory parameter (shape
), all the others are optional. So I believe that np.ndarray[float]
specifies that it corresponds to arrays whose shape is of type float
(i.e. nonsense). There's an explicit check to only allow 1 or 2 parameters in the generic type:
args_len = PyTuple_Check(args) ? PyTuple_Size(args) : 1;
if ((args_len > 2) || (args_len == 0)) {
return PyErr_Format(PyExc_TypeError,
"Too %s arguments for %s",
args_len > 2 ? "many" : "few",
((PyTypeObject *)cls)->tp_name);
}
generic_alias = Py_GenericAlias(cls, args);
This snippet checks that two arguments were passed to __class_getitem__
, raises otherwise, and in the valid cases defers to the C API version of typing.GenericAlias
.
I'm pretty sure that there's no technical reason to exclude the other parameters of the ndarray
constructor from the generic type, but there was a semantic reason that the third parameter, buffer
makes no sense to be included typing (or there was just a general push to reduce the generality of the generic type to most common use cases).
All that being said, I haven't been able to construct a small example in which a type passed for the shape
of the generic type leads to type checking errors in mypy
. From several attempts it seems as if the shape was always checked as if it were typing.Any
rather than whatever was passed as the first parameter of np.ndarray[...]
. For instance, consider the following example:
import numpy as np
first: np.ndarray[tuple[int], np.dtype[np.int64]] = np.arange(3) # OK
second: np.ndarray[tuple[int], np.dtype[np.int64]] = np.arange(3.0) # error due to dtype mismatch
third: np.ndarray[tuple[int, int, int], np.dtype[np.int64]] = np.arange(3) # no error even though shape mismatch
Running mypy 0.991 on Python 3.9 on this gives
foo.py:5: error: Incompatible types in assignment (expression has type "ndarray[Any, dtype[floating[Any]]]", variable has type "ndarray[Tuple[int], dtype[signedinteger[_64Bit]]]") [assignment]
Found 1 error in 1 file (checked 1 source file)
Only the dtype mismatch is found, but not the shape mismatch. And I see the same thing if I use np.ndarray((3,), dtype=...)
instead of np.arange()
, so it's not just due to weird typing of the np.arange()
helper (although I used it as an example because this is one function that's guaranteed to return a 1d array). Since I can't explain this behaviour I can't be certain that my understanding is correct, but I have no better model.
To come back to a question you asked in a comment:
Right, so then am I right in understanding that np.ndarray[int] is like np.ndarray[Any, int]?
No, at least we can exlude this (and what we see here is consistent with the first parameter only affecting the shape to whatever extent it does affect it):
from typing import Any
import numpy as np
first: np.ndarray[Any, np.dtype[np.int_]] = np.arange(3) # OK because dtype matches
second: np.ndarray[np.dtype[np.int_]] = np.arange(3) # OK because shape check doesn't actually work, and dtype is left as "any scalar"
third: np.ndarray[Any, np.dtype[np.int_]] = np.arange(3.0) # error due to dtype mismatch
fourth: np.ndarray[np.dtype[np.int_]] = np.arange(3.0) # no error, so this can't be the same as the third option
The result from mypy
:
foo.py:7: error: "ndarray" expects 2 type arguments, but 1 given [type-arg]
foo.py:9: error: Incompatible types in assignment (expression has type "ndarray[Any, dtype[floating[Any]]]", variable has type "ndarray[Any, dtype[signedinteger[Any]]]") [assignment]
foo.py:11: error: "ndarray" expects 2 type arguments, but 1 given [type-arg]
Found 3 errors in 1 file (checked 1 source file)
The four cases:
first
: explicitly typed as "int array with any shape", no error on type correct assignmentsecond
: typed as "array with an int-typed shape and any dtype", should fail because that's nonsense but doesn't (see the earlier musing about the impotence of shape type checks)third
: explicitly typed as another "int array with any shape", being assigned a double array, leading to an errorfourth
: typed as "array with an int-typed shape and any dtype", leading to no error (see second
). Since the third case leads to an error and the fourth doesn't, they can't be aliases of one another.Also notable that mypy complains about the two lines where np.ndarray[np.dtype[np.int_]]
is present:
foo.py:7: error: "ndarray" expects 2 type arguments, but 1 given [type-arg]
This sounds like a single-parameter use of the generic is forbidden as far as mypy is concerned. I'm not sure why this is the case, but this would certainly simplify the situation.