GFX::Monk Home

Posts tagged: "programming" - page 13

Entirely too much investigation into ruby's match operator

(how could a title like that not excite you! ;P)

So yesterday I had this weird regex issue in ruby. I wanted to get a regular expression containing a given string, but didn’t want to have to manually escape all the special characters. Regexp.escape to the rescue! It escapes all regex metacharacters in any given string, and returns it as a regex. In fact, the docs assure me:

For any string, Regexp.escape(str)=~str will be true.

But not so much in practice?

>> str = "123"            
=> "123"
>> Regexp.escape(str)=~str
TypeError: type mismatch: String given


So, problem one: Regexp.escape is broken. It returns not a Regexp, but a string. Oddly enough it also seems to escape spaces and other innocuous characters, but at least you get the right result if you pump it through Regexp.compile(). However, that wasn’t my only discovery.

I mentioned this to Matt, and he couldn’t make much sense of it either. He noticed that the type error is specific to strings - if you use a number it just returns false:

>> 123 =~ 'foo'
=> false

Seems a bit odd, really. Fixnum doesn’t implement =~, nor does Integer.

So I went doc spelunking. I found implementations for =~ in the following three important classes:

Regexp: regexp =~ str: do a regex match, as you might expect

String: str =~ obj: Call obj =~ str (i.e swap the order of your operands). Not mentioned in the docs but clearly apparent in the source (and experimentation): raise a TypeError if both arguments are strings. Without this, matching one string to another would very quickly run out of stack space.

Object: obj =~ other_obj: return false

So the Regexp implementation is fine. The String implementation is a little odd. I guess it’s there to allow people to write matching statements either way, but it seems like a dangerous (and confusing) habit to condone.

But the Object implementation? Why??? What possible reason could one have for doing a match operation against two objects, neither of which implement any matching behaviour? This has the painful side effect of giving every single object I inspect a “=~” method which does nothing. No wonder Object.new() has over 120 methods on it *.

For comparison, python’s object only has 12 methods / attributes. And they’re all special names, so there’s no pollution of regular names going on.

So there you go, two spoonfuls of broken in the one discovery!

(this is ruby 1.8.6, if that matters)

* I exaggerate, over 120 methods on object are just what you get in a rails app. Vanilla ruby only has 41 by my count. but it’s still completely unnecessary, and adds noise to inflate that number

ruby dataflow library

Pretty cool, I worry that such things can’t easily be done so cleanly in python…

(view link)

rednose: coloured output formatting plugin for nosetests

I recently wrote a plugin for nosetests which greatly (imho) improves the output for failed and errored tests. The screenshot explains it best.

To install, just run easy_install rednose. Then you can run nosetests with the –rednose option.

See github.com/gfxmonk/rednose for code and more information.

ruby - longing for some discipline

More and more, I am wishing that there was some sort of strict mode I could enable in ruby to say “you know what? I’m careful with my code. Please don’t assume things behind my back”. And to be honest, this mode would pretty much be synonymous with “just do what python would do”

By default, python is strict. If you index a dict (hash) with a nonexistant key, you get a keyError. If you don’t want to have to deal with that, you can use the get method and provide a default for if the key doesn’t exist. In ruby, if you want to be strict about anything, you generally have to write your own checks to guard against the core library’s forgivingness. Forgivingness sounds nice at first, but goes completely against the idea of failing fast, and frequently delays the manifestation of bugs, making them that much harder to actually track down.

Two examples that I came across within minutes of each other the other day:

Struct.new(:a,:b,:c).new('a','b')

that should NOT go without an exception

"a|b||c".split("|")
=> ["a", "b", "", "c"]

good…. so now:

"a|b||".split("|")
=> ["a", "b"]

argh! what have you done to my third field?

Size matters

On the same codebase, with no changes pending in either system:

$ time git status
# ...
real	0m0.618s

$ time bzr status
# ...
real	0m3.795s

It’s a small thing, but it matters.

Pretty Decorators

Python decorators are cool, but they can become very messy if you sart taking arguments:

def decorate_with_args(arg):
    def the_decorator(func):
        def run_it():
            func(arg)
        return run_it
    return the_decorator

@decorate_with_args('some string')
def messy(s):
    print s

ugh. Three levels of function definitions for a single decorator. And heaven forbid you want the decorator to be useable without supplying any arguments (not even empty brackets).

So then, I present a much cleaner decorator helper class:

class ParamDecorator(object):
    def __init__(self, decorator_function):
        self.func = decorator_function

    def __call__(self, *args, **kwargs):
        if len(args) == 1 and len(kwargs) == 0 and callable(args[0]):
            # we're being called without paramaters
            # (just the decorated function)
            return self.func(args[0])
        return self.decorate_with(args, kwargs)

    def decorate_with(self, args, kwargs):
        def decorator(callable_):
            return self.func(callable_, *args, **kwargs)
        return decorator

All that’s required of you is to take the decorated function as your first argument, and then any additional (optional) arguments. For example, here’s how you might implement a “pending” and “trace” decorator:

@ParamDecorator
def pending(decorated, reason='no reason given'):
    def run(*args, **kwargs):
        print "function '%s' is pending (%s)" % (decorated.__name__, reason)
    return run

@ParamDecorator
def trace(decorated, label=None):
    if label is None:
        label = decorated.__name__
    else:
        label = "%s (%s)" % (decorated.__name__, label)
    def run(*args, **kwargs):
        print "%s: started (args=%s, kwargs=%s)" % (label, args, kwargs)
        ret = decorated(*args, **kwargs)
        print "%s: returning: %s" % (label, ret)
        return ret
    return run

Which can then be used as either standard or paramaterised decorators:

@pending
def a():
    pass

@pending("I haven't done it yet!")
def b():
    pass

@trace
def foo():
    return "blah"

@trace("important function")
def bar():
    return "blech!"

And just to show what this all amounts to:

if __name__ == '__main__':
    a()
    b()
    foo()
    bar()

reveals the following output:

function 'a' is pending (no reason given)
function 'b' is pending (I haven't done it yet!)
foo: started (args=(), kwargs={})
foo: returning: blah
bar (important function): started (args=(), kwargs={})
bar (important function): returning: blech!

Fairly simple stuff, but hopefully useful for anyone who finds themselves tripped up by decorators - particularly when trying to allow for both naked and paramaterised decorators.

P.S: I’ve made a pastie of the code in this post, because my weblog engine is not cool enough to colour-code python ;)