I need to restructure some lists of tuples in python. I want to put the n-th value of each tuple in these lists into a separate tuple. The tuples in the lists are all similarly structured (e.g. position 1 is always an int
) and I provided the respective type hints. However, I unexpectedly receive an error message when I write the following code:
test_list: list[tuple[int, str]] = [(1, 'testa'), (2, 'testb')]
a: tuple[int]
b: tuple[str]
a, b = zip(*test_list)
As expected, a
and b
now only consist of int
and str
, respectively:
print(a) # Output: (1, 2)
print(b) # Output: ('testa', 'testb')
However, Pylance still complains about the zip expression:
Expression of type "tuple[int | str]" cannot be assigned to declared type "tuple[int]" "tuple[int | str]" is incompatible with "tuple[int]" Tuple entry 1 is incorrect type Type "int | str" cannot be assigned to type "int" "str" is incompatible with "int" (PylancereportGeneralTypeIssues) Expression of type "tuple[int | str]" cannot be assigned to declared type "tuple[str]" "tuple[int | str]" is incompatible with "tuple[str]" Tuple entry 1 is incorrect type Type "int | str" cannot be assigned to type "str" "int" is incompatible with "str" (PylancereportGeneralTypeIssues)
What do I have to change to get rid of the error message? Or is this a bug in Pylance? Does it not recognize the star operator?
I don't think the problem here is with Pylance or your code.
zip
accepts generic iterablesThe problem is in the way that zip
is designed/annotated. If we look at typeshed (always a great source for figuring out types of built-in functions), we can see that the the two-argument-overload looks something like this (simplified):
from __future__ import annotations
from collections.abc import Iterable, Iterator
from typing import TypeVar
T = TypeVar("T", covariant=True)
T1 = TypeVar("T1")
T2 = TypeVar("T2")
...
class zip(Iterator[T]):
def __new__(cls, iter1: Iterable[T1], iter2: Iterable[T2]) -> zip[tuple[T1, T2]]: ...
def __next__(self) -> T: ...
(Source, zip
starting in line 1673 as of today's main
branch)
What this means is that zip
implements the iterator protocol (also mentioned in the docs). It is in fact a generic iterator over a type T
and calling next
on an instance of such a zip
iterator returns something of type T
.
Moreover, the type T
is fully specified upon construction of a zip
instance as indicated by the __new__
return type annotation.
The key point however is that the arguments taken by __new__
are annotated as Iterable
. Those are generic over only one type argument. In this example the first iterable is generic over type argument T1
and the second over T2
. The type argument to the resulting zip
is then a tuple[T1, T2]
.
This is totally fine, when our iterables are in fact of a "consistent" type. Take the following example:
a = (1, 2)
b = ("1", "2")
x, y = zip(a, b)
reveal_type(x)
reveal_type(y)
Not sure how this is done with Pylance, but the reveal_type
statements cause mypy
to note the following:
note: Revealed type is "Tuple[builtins.int, builtins.str]" note: Revealed type is "Tuple[builtins.int, builtins.str]"
Makes sense, considering a
and b
are of type tuple[int, int]
and tuple[str, str]
respectively, which in turn is generalized to Iterable[int]
and Iterable[str]
when passed to the zip
constructor and there turned into the type argument tuple[int, str]
for the values returned by the iterator.
But what happens, if we change the setup just slightly:
a = (1, "2")
b = ("1", 2)
x, y = zip(a, b)
reveal_type(x)
reveal_type(y)
Now we get the following from mypy
:
note: Revealed type is "Tuple[builtins.object, builtins.object]" note: Revealed type is "Tuple[builtins.object, builtins.object]"
To be clear, the type of a
and b
is still correctly inferred to be tuple[int, str]
and tuple[str, int]
. But when those tuples are passed to zip
it looks at their type arguments and it needs to join them because it is defined on iterables of one type argument. And as you can see, int
and str
have only object
as their closest common base.
So the iterables are seen as containing object
, which then gives us the zip[tuple[object, object]]
.
Pylance seems to use unions instead of joins to find the type of those iterables, which is why you get that notice about the zip
iterator yielding tuple[int | str]
. Might be arguably the better approach compared to that of mypy
, but still unsatisfactory for your purposes. But I hope you see that the fundamental issue here is the same.
Tuples are special in that they are generic over a variable number of type arguments. The reasoning is supposedly that they are of fixed length and you can therefore very precisely parameterize them. Something like a list
can be changed in length and iterators don't even have a concept of length; they can potentially yield elements forever. So it seems reasonable to only give them one type parameter.
The problems arise, when we need to view a tuple
as an Iterable
. And I see no way around that.
As for how to proceed for you, this depends on what the actual use case is. The simplest way based on your minimal example would obviously be a type: ignore
at the moment of unpacking the zip
:
test_list: list[tuple[int, str]] = [(1, 'testa'), (2, 'testb')]
a: tuple[int]
b: tuple[str]
a, b = zip(*test_list) # type: ignore[assignment]
If you are actually dealing with functions and your setup is more involved, maybe there are other ways around this problem. If you elaborate, maybe we can work something else out. Other than that, there is no shame in using duly considered type: ignore
s in your code, when you hit the limits of the available typing system.