Can Python yield into a list be made more efficient?

In Python v2.7, if I have a function like this:

def holes_between(intervals):
  # Compute the holes between the intervals, for example:
  #   given the intervals: ([ 8,  9] [14, 18] [19, 20] [23, 32] [34, 49])
  #     compute the holes: ([10, 13] [21, 22] [33, 33])
  prec = intervals[0][1] + 1 # Bootstrap the iteration
  for low, high in intervals[1:]:
    if prec <= low - 1:
      yield (prec, low - 1)
    prec = high + 1

holes = list(holes_between(intervals))

Since the yield of the function is being collected into a list, is it more efficient to just build the list inside the holes_between function and if so, how would that be done most efficiently?

Solution

A generator function can be less efficient than building a list directly.

You can just build the list in the holes_between() function and return that:

def holes_between(intervals):
    prec = intervals[0][1] + 1 # Bootstrap the iteration
    result = []
    for low, high in intervals[1:]:
        if prec <= low - 1:
            result.append((prec, low - 1))
        prec = high + 1
    return result

but do measure the differences, using the timeit module.

If you have some typical input, you'd test that with:

import timeit

def holes_between_list(intervals):
    prec = intervals[0][1] + 1 # Bootstrap the iteration
    result = []
    for low, high in intervals[1:]:
        if prec <= low - 1:
            result.append((prec, low - 1))
        prec = high + 1
    return result

def holes_between_generate(intervals):
    prec = intervals[0][1] + 1 # Bootstrap the iteration
    for low, high in intervals[1:]:
        if prec <= low - 1:
            yield (prec, low - 1)
        prec = high + 1

intervals = [ ... ] # fill in some test data

print 'As list:', timeit.timeit(
    'holes_between(intervals)',
    'from __main__ import intervals, holes_between_list as holes_between')

print 'Using a generator:', timeit.timeit(
    'list(holes_between(intervals))',
    'from __main__ import intervals, holes_between_generate as holes_between')

The lower value is the faster method for your test data.