Group consecutive number ranges
I wanted to share a bit of cool Python code I wrote about a year ago and recently rediscovered.
import itertools
def group_ranges(L):
"""
Collapses a list of integers into a list of the start and end of
consecutive runs of numbers. Returns a generator of generators.
>>> [list(x) for x in group_ranges([1, 2, 3, 5, 6, 8])]
[[1, 3], [5, 6], [8]]
"""
for w, z in itertools.groupby(L, lambda x, y=itertools.count(): next(y)-x):
grouped = list(z)
yield (x for x in [grouped[0], grouped[-1]][:len(grouped)])
It’s a fairly good showcase of what you can do with Python generators and the itertools
package. The function groupby()
takes two arguments: the first is a list, and the second is a function that will act as the “key” by which your list will be grouped.
The custom grouping key is a lambda function that takes two arguments: x
, which will be filled with each element of the list, and y
, which is initialized by a call to the itertools.count()
function. The y
argument in this case is actually not ever used by itertools.groupby
; instead, we use it as a way to keep track of where we are in the list, because y
will hold the a reference to the same itertools.count
generator for the life of the lambda (which lasts as long as a call to group_ranges
). This is a somewhat obscure use of Python defaults. Consider the following function:
def myfunc(num, arr=[]):
arr.append(num)
print arr</p>
myfunc(1)
myfunc(2)
myfunc(3)
## [1, 2, 3]
If we call myfunc
several times, you might expect that a new, empty list is populated into the arr
argument on each function invocation. But try it in your Python interpreter. You’ll see that the same list is reused across function calls! This is known as the “mutable default argument”, and can sometimes be confusing.
We use this behavior to “save” the itertools.count
generator to ensure that we are always counting monotonically upwards each time the grouping lambda is called. This saves us from explicitly keeping track of the count via a closure, and allows the code to be much more terse. Since the lambda is defined within the function, each call to the group_ranges
function will start counting from 0 again.