Entirely too much investigation into ruby's match operator
(how could a title like that not excite you! ;P)
So yesterday I had this weird regex issue in ruby. I wanted to get a regular expression containing a given string, but didn’t want to have to manually escape all the special characters.
Regexp.escape to the rescue! It escapes all regex metacharacters in any given string, and returns it as a regex. In fact, the docs assure me:
For any string,
Regexp.escape(str)=~strwill be true.
But not so much in practice?
>> str = "123" => "123" >> Regexp.escape(str)=~str TypeError: type mismatch: String given
So, problem one: Regexp.escape is broken. It returns not a Regexp, but a string. Oddly enough it also seems to escape spaces and other innocuous characters, but at least you get the right result if you pump it through
Regexp.compile(). However, that wasn’t my only discovery.
I mentioned this to Matt, and he couldn’t make much sense of it either. He noticed that the type error is specific to strings - if you use a number it just returns false:
>> 123 =~ 'foo' => false
Seems a bit odd, really. Fixnum doesn’t implement
=~, nor does Integer.
So I went doc spelunking. I found implementations for
=~ in the following three important classes:
regexp =~ str: do a regex match, as you might expect
str =~ obj: Call obj =~ str (i.e swap the order of your operands). Not mentioned in the docs but clearly apparent in the source (and experimentation): raise a TypeError if both arguments are strings. Without this, matching one string to another would very quickly run out of stack space.
obj =~ other_obj: return false
So the Regexp implementation is fine. The String implementation is a little odd. I guess it’s there to allow people to write matching statements either way, but it seems like a dangerous (and confusing) habit to condone.
But the Object implementation? Why??? What possible reason could one have for doing a match operation against two objects, neither of which implement any matching behaviour? This has the painful side effect of giving every single object I inspect a “=~” method which does nothing. No wonder
Object.new() has over 120 methods on it *.
For comparison, python’s
object only has 12 methods / attributes. And they’re all special names, so there’s no pollution of regular names going on.
So there you go, two spoonfuls of broken in the one discovery!
(this is ruby 1.8.6, if that matters)
* I exaggerate, over 120 methods on object are just what you get in a rails app. Vanilla ruby only has 41 by my count. but it’s still completely unnecessary, and adds noise to inflate that number