Group consecutive number ranges
I wanted to share a bit of cool Python code I wrote about a year ago and recently rediscovered.
import itertools def group_ranges(L): """ Collapses a list of integers into a list of the start and end of consecutive runs of numbers. Returns a generator of generators. >>> [list(x) for x in group_ranges([1, 2, 3, 5, 6, 8])] [[1, 3], [5, 6], ] """ for w, z in itertools.groupby(L, lambda x, y=itertools.count(): next(y)-x): grouped = list(z) yield (x for x in [grouped, grouped[-1]][:len(grouped)])
It’s a fairly good showcase of what you can do with Python generators and the
itertools package. The function
groupby() takes two arguments: the first is a list, and the second is a function that will act as the “key” by which your list will be grouped.
The custom grouping key is a lambda function that takes two arguments:
x, which will be filled with each element of the list, and
y, which is initialized by a call to the
itertools.count() function. The
y argument in this case is actually not ever used by
itertools.groupby; instead, we use it as a way to keep track of where we are in the list, because
y will hold the a reference to the same
itertools.count generator for the life of the lambda (which lasts as long as a call to
group_ranges). This is a somewhat obscure use of Python defaults. Consider the following function:
def myfunc(num, arr=): arr.append(num) print arr</p> myfunc(1) myfunc(2) myfunc(3) ## [1, 2, 3]
If we call
myfunc several times, you might expect that a new, empty list is populated into the
arr argument on each function invocation. But try it in your Python interpreter. You’ll see that the same list is reused across function calls! This is known as the “mutable default argument”, and can sometimes be confusing.
We use this behavior to “save” the
itertools.count generator to ensure that we are always counting monotonically upwards each time the grouping lambda is called. This saves us from explicitly keeping track of the count via a closure, and allows the code to be much more terse. Since the lambda is defined within the function, each call to the
group_ranges function will start counting from 0 again.