Search code examples
pythonpython-2.7listgeneratoryield

Can Python yield into a list be made more efficient?


In Python v2.7, if I have a function like this:

def holes_between(intervals):
  # Compute the holes between the intervals, for example:
  #   given the intervals: ([ 8,  9] [14, 18] [19, 20] [23, 32] [34, 49])
  #     compute the holes: ([10, 13] [21, 22] [33, 33])
  prec = intervals[0][1] + 1 # Bootstrap the iteration
  for low, high in intervals[1:]:
    if prec <= low - 1:
      yield (prec, low - 1)
    prec = high + 1

holes = list(holes_between(intervals))

Since the yield of the function is being collected into a list, is it more efficient to just build the list inside the holes_between function and if so, how would that be done most efficiently?


Solution

  • A generator function can be less efficient than building a list directly.

    You can just build the list in the holes_between() function and return that:

    def holes_between(intervals):
        prec = intervals[0][1] + 1 # Bootstrap the iteration
        result = []
        for low, high in intervals[1:]:
            if prec <= low - 1:
                result.append((prec, low - 1))
            prec = high + 1
        return result
    

    but do measure the differences, using the timeit module.

    If you have some typical input, you'd test that with:

    import timeit
    
    def holes_between_list(intervals):
        prec = intervals[0][1] + 1 # Bootstrap the iteration
        result = []
        for low, high in intervals[1:]:
            if prec <= low - 1:
                result.append((prec, low - 1))
            prec = high + 1
        return result
    
    def holes_between_generate(intervals):
        prec = intervals[0][1] + 1 # Bootstrap the iteration
        for low, high in intervals[1:]:
            if prec <= low - 1:
                yield (prec, low - 1)
            prec = high + 1
    
    intervals = [ ... ] # fill in some test data
    
    print 'As list:', timeit.timeit(
        'holes_between(intervals)',
        'from __main__ import intervals, holes_between_list as holes_between')
    
    print 'Using a generator:', timeit.timeit(
        'list(holes_between(intervals))',
        'from __main__ import intervals, holes_between_generate as holes_between')
    

    The lower value is the faster method for your test data.