GFX::Monk Home

Posts tagged: "python" - page 3

group_sequential

So the other day I had a list of (html) elements, and I wanted to get an array representing lines of text. The only problem being that some of the elements are displayed inline - so I needed to join those together. But only when they appeared next to each other.

I would call the generic way of doing so group_sequential, where an array is chunked into sub-arrays, and sequential elements satisfying some predicate are included in the same sub-array. That way, my predicate could be :inline?, and I could join the text of each grouped element together to get the lines out.

For example, using even numbers for simplicity:

[1,2,3,4,6,8,5,4,4].group_sequential(&:even?)
=> [[1],[2],[3],[4,6,8],[5],[4,4]]

Here’s the ruby code I came up with:

	class Array
		def group_sequential
			result = []
			group = []
			finish_group = lambda do
				unless group.empty?
					result << group
					group = []
				end
			end

			self.each do |elem|
				if yield elem
					group << elem
				else
					finish_group.call
					result << [elem]
				end
			end
			finish_group.call
			result
		end
	end
	

Things are slightly less noisy, but assignment is subtly awkward in python without using nonlocal scope keyword (only available in python3):

	def group_sequential(predicate, sequence):
		result = []
		group = []
		def finish_group():
			if group:
				result.append(group)
			return []

		for item in sequence:
			if predicate(item):
				group.append(item)
			else:
				group = finish_group()
				result.append([item])
		group = finish_group()
		return result
	

This feels like something that should be doable in a much more concise way than I came up with above. Any ideas? (In either ruby or python)


Update:

My friend Iain has posted a number of interesting solutions over yonder, which got me thinking differently about it (specifically, reminding me of dropwhile and takewhile). I applied python’s itertools to the problem to get this rather satisfactory result in python:

	from itertools import takewhile, tee, dropwhile
	def group_sequential(pred, sequence):
		taker, dropper = tee(iter(sequence))
		while True:
			group = list(takewhile(pred, taker))
			if group: yield group
			yield [dropwhile(pred, dropper).next()]
	

(note: this is a generator which is fine for my purposes - you can always wrap it in a call to list() to force it into an actual list).

The same approach is acceptable when done in ruby, but a bit more verbose because of the need to explicitly check for the end of the sequence, and to collect the results array:

	class Array
		def group_sequential(&pred)
			sequence = self
			results = []
			while true
				group = sequence.take_while &pred
				results << group if group.size > 0
				sequence = sequence.drop_while &pred
				return results if sequence.empty?
				results << [sequence.shift]
			end
		end
	end
	

Update (the second):

While poking around itertools, I managed to miss groupby. I assumed it did the same thing as ruby’s Enumerable#group_by, which is to say not at all what I want (though it’s surely useful at other times). So here is presumably the most concise version I’ll find, for the sake of closure:

	from itertools import groupby
	def group_sequential(pred, sequence):
		return [list(group) for key, group in groupby(sequence, pred)]
	

Recursively Default Dictionaries

Today I was asked if I knew how to make a recursively default dictionary (although not in so many words). What that means is that it’s a dictionary (or hash) which is defaulted to an empty version of itself for every item access. That way, you can throw data into a multi-dimensional dictionary without regard for whether keys already exist, like so:

h["a"]["b"]["c"] = 5

Without having to first initialise h[“a”] and h[“a”][“b”].

A dictionary with a default value of an empty hash sprang to mind, but after trying it out I realised that this only works for one level. Recursion was evidently required.

So, here’s the python solution:

from collections import defaultdict
new_dict = lambda: defaultdict(new_dict)
h = defaultdict(new_dict)

And the ruby, which seems overly noisy:

new_hash = lambda { |hash, key| hash[key] = Hash.new &new_hash }
h = Hash.new(&new_hash)

Autonose: continuous test runner for python's nosetests

Today I’ve put up the first “releaseable” version of autonose. Basically, it analyses your code’s imports, and determines exactly which tests rely on which code. So whenever you change a file, it’ll automatically run the tests that depend on the changed file (be it directly or transitively). Give it a go, and please let me know your feedback.

All you need do is:

$ easy_install autonose
$ cd [project_with_some_tests_in_it]
$ autonose

See the github project page for further information (and a screenshot).

rednose: coloured output formatting plugin for nosetests

I recently wrote a plugin for nosetests which greatly (imho) improves the output for failed and errored tests. The screenshot explains it best.

To install, just run easy_install rednose. Then you can run nosetests with the –rednose option.

See github.com/gfxmonk/rednose for code and more information.

ruby - longing for some discipline

More and more, I am wishing that there was some sort of strict mode I could enable in ruby to say “you know what? I’m careful with my code. Please don’t assume things behind my back”. And to be honest, this mode would pretty much be synonymous with “just do what python would do”

By default, python is strict. If you index a dict (hash) with a nonexistant key, you get a keyError. If you don’t want to have to deal with that, you can use the get method and provide a default for if the key doesn’t exist. In ruby, if you want to be strict about anything, you generally have to write your own checks to guard against the core library’s forgivingness. Forgivingness sounds nice at first, but goes completely against the idea of failing fast, and frequently delays the manifestation of bugs, making them that much harder to actually track down.

Two examples that I came across within minutes of each other the other day:

Struct.new(:a,:b,:c).new('a','b')

that should NOT go without an exception

"a|b||c".split("|")
=> ["a", "b", "", "c"]

good…. so now:

"a|b||".split("|")
=> ["a", "b"]

argh! what have you done to my third field?

Pretty Decorators

Python decorators are cool, but they can become very messy if you sart taking arguments:

def decorate_with_args(arg):
    def the_decorator(func):
        def run_it():
            func(arg)
        return run_it
    return the_decorator

@decorate_with_args('some string')
def messy(s):
    print s

ugh. Three levels of function definitions for a single decorator. And heaven forbid you want the decorator to be useable without supplying any arguments (not even empty brackets).

So then, I present a much cleaner decorator helper class:

class ParamDecorator(object):
    def __init__(self, decorator_function):
        self.func = decorator_function

    def __call__(self, *args, **kwargs):
        if len(args) == 1 and len(kwargs) == 0 and callable(args[0]):
            # we're being called without paramaters
            # (just the decorated function)
            return self.func(args[0])
        return self.decorate_with(args, kwargs)

    def decorate_with(self, args, kwargs):
        def decorator(callable_):
            return self.func(callable_, *args, **kwargs)
        return decorator

All that’s required of you is to take the decorated function as your first argument, and then any additional (optional) arguments. For example, here’s how you might implement a “pending” and “trace” decorator:

@ParamDecorator
def pending(decorated, reason='no reason given'):
    def run(*args, **kwargs):
        print "function '%s' is pending (%s)" % (decorated.__name__, reason)
    return run

@ParamDecorator
def trace(decorated, label=None):
    if label is None:
        label = decorated.__name__
    else:
        label = "%s (%s)" % (decorated.__name__, label)
    def run(*args, **kwargs):
        print "%s: started (args=%s, kwargs=%s)" % (label, args, kwargs)
        ret = decorated(*args, **kwargs)
        print "%s: returning: %s" % (label, ret)
        return ret
    return run

Which can then be used as either standard or paramaterised decorators:

@pending
def a():
    pass

@pending("I haven't done it yet!")
def b():
    pass

@trace
def foo():
    return "blah"

@trace("important function")
def bar():
    return "blech!"

And just to show what this all amounts to:

if __name__ == '__main__':
    a()
    b()
    foo()
    bar()

reveals the following output:

function 'a' is pending (no reason given)
function 'b' is pending (I haven't done it yet!)
foo: started (args=(), kwargs={})
foo: returning: blah
bar (important function): started (args=(), kwargs={})
bar (important function): returning: blech!

Fairly simple stuff, but hopefully useful for anyone who finds themselves tripped up by decorators - particularly when trying to allow for both naked and paramaterised decorators.

P.S: I’ve made a pastie of the code in this post, because my weblog engine is not cool enough to colour-code python ;)