group_sequential
So the other day I had a list of (html) elements, and I wanted to get an array representing lines of text. The only problem being that some of the elements are displayed inline - so I needed to join those together. But only when they appeared next to each other.
I would call the generic way of doing so group_sequential
, where an array is chunked into sub-arrays, and sequential elements satisfying some predicate are included in the same sub-array. That way, my predicate could be :inline?
, and I could join the text of each grouped element together to get the lines out.
For example, using even numbers for simplicity:
[1,2,3,4,6,8,5,4,4].group_sequential(&:even?)
=> [[1],[2],[3],[4,6,8],[5],[4,4]]
Here’s the ruby code I came up with:
class Array
def group_sequential
result = []
group = []
finish_group = lambda do
unless group.empty?
result << group
group = []
end
end
self.each do |elem|
if yield elem
group << elem
else
finish_group.call
result << [elem]
end
end
finish_group.call
result
end
end
Things are slightly less noisy, but assignment is subtly awkward in python without using nonlocal scope keyword (only available in python3):
def group_sequential(predicate, sequence):
result = []
group = []
def finish_group():
if group:
result.append(group)
return []
for item in sequence:
if predicate(item):
group.append(item)
else:
group = finish_group()
result.append([item])
group = finish_group()
return result
This feels like something that should be doable in a much more concise way than I came up with above. Any ideas? (In either ruby or python)
Update:
My friend Iain has posted a number of interesting solutions over yonder, which got me thinking differently about it (specifically, reminding me of dropwhile
and takewhile
). I applied python’s itertools
to the problem to get this rather satisfactory result in python:
from itertools import takewhile, tee, dropwhile
def group_sequential(pred, sequence):
taker, dropper = tee(iter(sequence))
while True:
group = list(takewhile(pred, taker))
if group: yield group
yield [dropwhile(pred, dropper).next()]
(note: this is a generator which is fine for my purposes - you can always wrap it in a call to list()
to force it into an actual list).
The same approach is acceptable when done in ruby, but a bit more verbose because of the need to explicitly check for the end of the sequence, and to collect the results
array:
class Array
def group_sequential(&pred)
sequence = self
results = []
while true
group = sequence.take_while &pred
results << group if group.size > 0
sequence = sequence.drop_while &pred
return results if sequence.empty?
results << [sequence.shift]
end
end
end
Update (the second):
While poking around itertools, I managed to miss groupby
. I assumed it did the same thing as ruby’s Enumerable#group_by
, which is to say not at all what I want (though it’s surely useful at other times). So here is presumably the most concise version I’ll find, for the sake of closure:
from itertools import groupby
def group_sequential(pred, sequence):
return [list(group) for key, group in groupby(sequence, pred)]