Search code examples
pythonpython-itertoolspython-typing

Type hinting pairwise with overhang


I took over a code base (support down to 3.9) and wanted to add some type hinting. However I am currently stuck at this function.

def _pairwise(iterable: T.Iterable, end=None) -> T.Iterable:
    left, right = itertools.tee(iterable)
    next(right, None)
    return itertools.zip_longest(left, right, fillvalue=end)

Which is later used to iterate over regex matches and extract their start and end indices for slicing. The last fill value of None is used to have the last slice go to the end of the string.

We know that the actual signature should be

_pairwise(iterable: Iterable[T], end: Optional[T] = None) -> Iterator[tuple[T, Optional[T]]]

because left is guaranteed to be at least as long as right.

However the approach with the zip_longest does not allow that. Type checkers read that as Iterator[Optional[T], Optional[T]].

I have rewritten the function so that the type checker (pyright) is able to verify that target signature.

def _pairwise(
    iterable: Iterable[T], end: Optional[T] = None
) -> Iterator[tuple[T, Optional[T]]]:
    left, right = itertools.tee(iterable)
    next(right, None)
    for x, y in zip(right, left):
        yield y, x
    if (last := next(left, None)) is not None:
        yield last, end

However I am not particularly pleased with this result yet. First the need to swap the arguments to zip to avoid it taking one extra step on left as well as the manual check to deal with the case of the argument being an empty iterable.

This also means that the functionality is not the intended one for T=NoneType although that is not actually a problem, but it does annoy me a bit.

Is there any other way to get this pairwise functionality to typecheck?


Solution

  • Your zip_longest solution looks clean to me, and # type: ignore[return-value] would be a good fit there, probably with a short explanatory comment.

    However, to make your code typecheck, you could use plain zip and add "filler" entry to the end manually like this:

    import itertools
    from typing import Iterable, Iterator, TypeVar, Optional
    
    T = TypeVar('T')
    
    
    def _pairwise(
        iterable: Iterable[T], end: Optional[T] = None
    ) -> Iterator[tuple[T, Optional[T]]]:
        left, right = itertools.tee(iterable)
        next(right, None)
        return zip(left, itertools.chain(right, [end]))
    

    Now mypy is happy about this code, and you're slightly more explicit: you know that second iterable (right) is one element shorter than left unless iterable was empty, thus appending one item will make them equal. In case of empty input both implementation produce an empty iterator.