Search code examples
pythonpython-3.xpython-itertoolsnonetype

How to get a zip of all characters in a string. zip misses out on final characters and itertools.zip_longest adds none


I am passing the result of itertools.zip_longest to itertools.product, however I get errors when it gets to the end and finds None.

The error I get is: Error: (, TypeError('sequence item 0: expected str instance, NoneType found',), )

If I use zip instead of itertools.zip_longest then I don't get all the items.

Here is the code I am using to generate the zip:

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    print(args)
    #return zip(*args)
    return itertools.zip_longest(*args)

sCharacters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~`!@#$%^&*()_-+={[}]|\"""':;?/>.<,"

for x in grouper(sCharacters, 4):
    print(x)

Here is the output. The first one is itertools.zip_longest and the second is just zip. You can see the first with the None items and the second is missing the final item, the comma: ','

enter image description here

How can I get a zip of all characters in a string without the none at the end. Or how can I avoid this error?

Thanks for your time.


Solution

  • I've had to solve this in a performance critical case before, so here is the fastest code I've found for doing this (works no matter the values in iterable):

    from itertools import zip_longest
    
    def grouper(n, iterable):
        fillvalue = object()  # Guaranteed unique sentinel, cannot exist in iterable
        for tup in zip_longest(*(iter(iterable),) * n, fillvalue=fillvalue):
            if tup[-1] is fillvalue:
                yield tuple(v for v in tup if v is not fillvalue)
            else:
                yield tup
    

    The above is, a far as I can tell, unbeatable when the input is long enough and the chunk sizes are small enough. For cases where the chunk size is fairly large, it can lose out to this even uglier case, but usually not by much:

    from future_builtins import map  # Only on Py2, and required there
    from itertools import islice, repeat, starmap, takewhile
    from operator import truth  # Faster than bool when guaranteed non-empty call
    
    def grouper(n, iterable):
        '''Returns a generator yielding n sized groups from iterable
        
        For iterables not evenly divisible by n, the final group will be undersized.
        '''
        # Can add tests to special case other types if you like, or just
        # use tuple unconditionally to match `zip`
        rettype = ''.join if type(iterable) is str else tuple
    
        # Keep islicing n items and converting to groups until we hit an empty slice
        return takewhile(truth, map(rettype, starmap(islice, repeat((iter(iterable), n)))))
        
    

    Either approach seamlessly leaves the final element incomplete if there aren't sufficient items to complete the group. It runs extremely fast because literally all of the work is pushed to the C layer in CPython after "set up", so however long the iterable is, the Python level work is the same, only the C level work increases. That said, it does a lot of C work, which is why the zip_longest solution (which does much less C work, and only trivial Python level work for all but the final chunk) usually beats it.

    The slower, but more readable equivalent code to option #2 (but skipping the dynamic return type in favor of just tuple) is:

     def grouper(n, iterable):
         iterable = iter(iterable)
         while True:
             x = tuple(islice(iterable, n))
             if not x:
                 return
             yield x
    

    Or more succinctly with Python 3.8+'s walrus operator:

     def grouper(n, iterable):
         iterable = iter(iterable)
         while x := tuple(islice(iterable, n)):
             yield x