GFX::Monk Home

Posts tagged: ruby

Running a child process in Ruby (properly)

(cross-posted on the Zendesk Engineering blog)

We use Ruby a lot at Zendesk, and mostly it works pretty well. But one thing that sucks is when it makes the wrong solution easy, and the right solution not just hard, but hard to even find.

Spawning a process is one such scenario. Want to spawn a child process to run some system command? Easy! Just pick the method that’s right for you:

  • `backticks`
  • %x[different backticks]
  • Kernel.system()
  • Kernel.spawn()
  • IO.popen()
  • Open3.capture2
  • Open3.capture2, Open3.capture2e, Open3.capture3, Open3.popen2, Open3.popen2e, Open3.popen3

… and that’s ignoring the more involved options, like pairing a Kernel#fork with a Kernel#exec, as well as the many different Open3.pipeline_* functions.

What are we doing here?

Often enough, you want to run a system command (i.e. something you might normally run from a terminal) from your Ruby code. You might be running a command just for its side effects (e.g. chmod a file), or you might want to use the output of the command in your code (e.g. tar -tf to list the contents of a tarball). Most of the above functions will work, but some of them are better than others.

Ruby's split() function makes me feel special (in a bad way)

Quick hand count: who knows what String.split() does?

Most developers probably do. Python? easy. Javascript? probably. But if you’re a ruby developer, chances are close to nil. I’m not trying to imply anything about the intelligence or skill of ruby developers, it’s just that the odds are stacked against you.


So, what does String.split() do?

In the simple case, it takes a separator string. It returns an array of substrings, split on the given string. Like so:

py> "one|two|three".split("|")
["one", "two", "three"]

Simple enough. As an extension, some languages allow you to pass in a num_splits option. In python, it splits only this many times, like so:

py> "one|two|three".split("|", 1)
["one", "two|three"]

Ruby is similar, although you have to add one to the second argument (it talks about number of returned components, rather than number of splits performed).

Javascript is a bit odd, in that it will ignore the rest of the string if you limit it:

js> "one|two|three".split("|", 2)
["one", "two"]

I don’t like the javascript way, but these are all valid interpretations of split. So far. And that’s pretty much all you have to know for python and javascript. But ruby? Pull up a seat.

Ruby's unicode treatment

I recently came across this enlightening post on the changes to strings and encodings in ruby 1.9. As a python lover who has only used ruby 1.8 so far, it’s interesting to see the different approaches to very similar problems in python 3 and ruby 1.9.

I may be biased, but ruby’s implementation sounds like it will lead to a lot of pain and bugs, while python’s implementation will lead to a little more pain as you are forced to learn about encodings, and a lot less bugs (as you are forced to learn about encodings). Let me explain why:

Four Common (and Broken) Ruby Operations

All of these lines, in ruby, should fail. All of them instead return nil:

@nonexistant_var
{}[:nonexistant_key]
[].first
{}.shift

All of these were encountered by myself in the course of yesterday’s programming. None of them in a good way. And the last two were in published libraries, not even code under development.

All of these, of course, raise errors in python. I refer you to lines 10 and 11 of the zen of python:

Errors should never pass silently.

Unless explicitly silenced

(an Option or Maybe type would be acceptable also, but that’s pretty uncommon to find in a dynamic language)

Also inviting my fury: every single language, tool or function, ever, that makes you check the return code of a system (shell) command to see whether it was nonzero.

How I Replaced Cucumber With 65 Lines of Python

Update:

I’ve since cleaned up the code here and published it as a tiny library: pea on github

Aside: why cucumber doesn’t work as well as everyone thinks it should

I’ve used cucumber at work for a reasonably large project, and I wasn’t impressed. Having one canonical language for stories sounds great, until you have enough arguments about how things should be phrased that you eventually come to the realisation that BAs don’t want to write their specifications as tests, and you don’t write your tests as specifications.

This is a test style assertion step:

Then the total of the items should be 42

..and this is the same step in a requirements-style of language:

Then the total of the items should equal the sum of the number of items in each category

To a BA, the first example is a lie. The sum shouldn’t be 42, it should be the correct number! And to an automated program, the second statement is nigh on useless. Saying what something is supposed to be made from is just doing the same calculation twice - there’s nothing stopping you from doing it wrong both times! If you want to check that it’s getting the right answer, you need to tell it what the right answer is, not just tell it (again) how to make it.

So I’m not a huge fan of having cucumber scenarios be the single source of truth for requirements. If the programmers have their way it’s just a series of examples (also known as “tests”), and if the BAs have their way it’s just a series of feeble assertions that don’t necessarily check what they say they’re checking.

But it’s not all bad…

But on programmer-oriented projects, I can see them working quite well. For example, I’ve recently upgraded a large suite of specs to rspec 2, and made heavy use of the browsable cucumber scenarios on relishapp.com as actual, useful documentation.

So I decided to try cucumber on one of my own projects. Since I am obviously a python fiend outside of work, I wasn’t going to use cucumber. So out came the (very young) python port of cucumber, called lettuce (where did this salad theme even come from? o_O). I gave it a go, and of course it’s naturally a bit more awkward than ruby because python doesn’t have blocks. It’s also more than a little buggy, and lacking some useful features that cucumber has (which is to be expected of such a young project).

I started hacking on it to add or improve features, and then got sick of it. It really does seem a little ridiculous. We’re actually inventing a (trivial) language, and parsing it, and using little regex parsers in each of our steps, and mapping each of those regexes to little chunks of code. And all this makes it hard to find usages, hard to track duplication and dead code, and generally just awful to navigate and manage.

The punchline

So, you know what? I just transformed all my steps into valid python code instead. Each regex replaced with a function name, and each matching group an argument (python’s keyword-arguments help here). 65 lines of code later, I have a very similar result using plain-old python.

Here is a comparison. The old feature:

Feature: running indicate-task
	Running a basic, blocking process that
	consumes and produces output.

Scenario: running and cancelling a program
	When I run indicate-task -- cat
	And I enter "input"
	And I press ctrl-c
	And I wait for the task to complete

	Then there should be a "cat" indicator
	And it should have a menu description of "cat: running..."
	And the output should be: input
	And the error output should be empty
	And the return code should not be 0
	And it should display the task's output to the user
	And it should notify the user of the task's completion

And the new, normal, actual-python-code-that-works-just-fine-with-ctags-and-isn’t-built-with-dirty-regexes version:

from makeshift_cucumber import *
from base_test import BaseTest

class TestRunning(BaseTest):
	"""
	Feature: Running a basic, blocking process that
	consumes and produces output.
	"""

	def test_running_and_cancelling_a_program(self):
		When.I_run_indicate_task('--', 'cat')
		And.I_enter("input")
		And.I_press_ctrl_c()
		And.I_wait_for_the_task_to_complete()
		Then.there_should_be_an_indicator_named("cat")
		And.it_should_have_a_menu_description_of("cat: running...")
		And.the_output_should_be('input')
		And.the_error_output_should_be_empty()
		And.the_return_code_should_not_be(0)
		And.it_should_display_the_tasks_output_to_the_user()
		And.it_should_notify_the_user_of_the_tasks_completion()

No, you probably wouldn’t be able to get a businessman to write a scenario. But has that ever actually worked with cucumber either? I find it doubtful. The results are just as readable, and insanely simpler in terms of the complexity of the testing infrastructure. Plus, it’s just a normal test, the functions are just normal functions, and the arguments are just normal arguments.

And if you don’t want to give that to a BA, just show them the test output instead:

terminal output

I’ll try to clean this up some time into a proper library & formatter sometime, because I think the mess of code you end up with cucumber is just too ridiculous for the benefits you get, and this sort of thing is much more developer-friendly while maintaining most of the readability benefits.

The -rubygems Flag

I was always slightly confused that despite rubygems not being part of the ruby language or interpreter, there is nonetheless a -rubygems option you can give to ruby to enable rubygems.

Today when I was delving through some stack traces, I noticed an odd looking filename at the root of it all. As I’m sure many before me have realised (a bunch of my workmates already knew about this), the -rubygems flag is not a real flag at all. It’s just a perverted case of the -r module syntax which tells ruby to require a file by name. Because when you install rubygems, it conveniently installs a file called ubygems.rb whose contents is simply require "rubygems". Very sneaky…

rvm: Manage Your Rubies With an Ill-Managed Manager

rvm is a tool for maintaining multiple versions of ruby, as well as maintaining project-specific sets of gem dependencies. When I first learnt about it this week it sounded like a very useful tool, although it’s unfortunate that gems are so awkward to manage that it should be necessary in the first place.

Yesterday my first task was to update rspec. Which in turn required an update to rubygems before it would install. But who manages rubygems? It could be rvm, or rubygems itself, or apt, or even maybe bundler.

I looked through the documentation, and the most appropriate answer seemed to be that rvm should manage rubygems. I quote from the documentation:

rvm action [interpreter] [flags] [options]

where update is an action, and one of the flags is --rubygems:

--rubygems    - with update, updates rubygems for selected ruby

So I diligently typed

rvm update --rubygems

And what did rvm do? It proceeded to attempt to update itself, instead of rubygems. If you want to upgrade rubygems, you’re supposed to type:

rvm --rubygems update

(note that this is incorrect according to the above documentation, but is how I eventually coerced it into upgrading rubygems (this bug has since been fixed))

The accidental upgrade might have been okay, if its upgrade process were anything but Completely Insane. It goes thusly:

  • download a file from an unsecured HTTP location
  • without verifying any sort of checksum, signature or even HTTP status code, pipe the output directly into a bash shell
  • this script clones a github repository, and proceeds to install the absolute latest revision, whatever that might be

Hilarity ensues. I got a bash syntax error, but evidently not early enough in the process to stop rvm from destroying itself, requiring me to delete everything related to it and install from scratch.

Security? ignored.

Sanity checking? skipped.

Dependencies? get them yourself.

Update management? The website says “make sure you run this command frequently”.

I don’t know that I want such a tool trying to manage my dependencies, thank you very much…

The most painful thing, of course, is that it’s yet another buggy, language-specific implementation of the principals that zero-install does so much better (and simpler). If you don’t have global state, suddenly it’s really not that hard to keep things from interfering with each other.

Oh, and did I mention how rvm integrates with your shell, so that when you cd into a project directory, it automatically sets up your ruby version and gems? Except that when you open a new shell in the same location, you have to cd out of your project directory and then back in or else you’ll see the system version of ruby and your gems, and things will be broken in very odd ways. Splendid.

group_sequential

So the other day I had a list of (html) elements, and I wanted to get an array representing lines of text. The only problem being that some of the elements are displayed inline - so I needed to join those together. But only when they appeared next to each other.

I would call the generic way of doing so group_sequential, where an array is chunked into sub-arrays, and sequential elements satisfying some predicate are included in the same sub-array. That way, my predicate could be :inline?, and I could join the text of each grouped element together to get the lines out.

For example, using even numbers for simplicity:

[1,2,3,4,6,8,5,4,4].group_sequential(&:even?)
=> [[1],[2],[3],[4,6,8],[5],[4,4]]

Here’s the ruby code I came up with:

	class Array
		def group_sequential
			result = []
			group = []
			finish_group = lambda do
				unless group.empty?
					result << group
					group = []
				end
			end

			self.each do |elem|
				if yield elem
					group << elem
				else
					finish_group.call
					result << [elem]
				end
			end
			finish_group.call
			result
		end
	end
	

Things are slightly less noisy, but assignment is subtly awkward in python without using nonlocal scope keyword (only available in python3):

	def group_sequential(predicate, sequence):
		result = []
		group = []
		def finish_group():
			if group:
				result.append(group)
			return []

		for item in sequence:
			if predicate(item):
				group.append(item)
			else:
				group = finish_group()
				result.append([item])
		group = finish_group()
		return result
	

This feels like something that should be doable in a much more concise way than I came up with above. Any ideas? (In either ruby or python)


Update:

My friend Iain has posted a number of interesting solutions over yonder, which got me thinking differently about it (specifically, reminding me of dropwhile and takewhile). I applied python’s itertools to the problem to get this rather satisfactory result in python:

	from itertools import takewhile, tee, dropwhile
	def group_sequential(pred, sequence):
		taker, dropper = tee(iter(sequence))
		while True:
			group = list(takewhile(pred, taker))
			if group: yield group
			yield [dropwhile(pred, dropper).next()]
	

(note: this is a generator which is fine for my purposes - you can always wrap it in a call to list() to force it into an actual list).

The same approach is acceptable when done in ruby, but a bit more verbose because of the need to explicitly check for the end of the sequence, and to collect the results array:

	class Array
		def group_sequential(&pred)
			sequence = self
			results = []
			while true
				group = sequence.take_while &pred
				results << group if group.size > 0
				sequence = sequence.drop_while &pred
				return results if sequence.empty?
				results << [sequence.shift]
			end
		end
	end
	

Update (the second):

While poking around itertools, I managed to miss groupby. I assumed it did the same thing as ruby’s Enumerable#group_by, which is to say not at all what I want (though it’s surely useful at other times). So here is presumably the most concise version I’ll find, for the sake of closure:

	from itertools import groupby
	def group_sequential(pred, sequence):
		return [list(group) for key, group in groupby(sequence, pred)]
	

rspec immediate feedback formatter

I had cause to repurpose this immediate feedback formatter to work with the SpecDoc rspec formatter. So here’s a minimal version that just monkey-patches the SpecdocFormatter class to provide immediate feedback. As long as it’s included somewhere before rspec starts running, it should do its thing…

(view link)

Recursively Default Dictionaries

Today I was asked if I knew how to make a recursively default dictionary (although not in so many words). What that means is that it’s a dictionary (or hash) which is defaulted to an empty version of itself for every item access. That way, you can throw data into a multi-dimensional dictionary without regard for whether keys already exist, like so:

h["a"]["b"]["c"] = 5

Without having to first initialise h[“a”] and h[“a”][“b”].

A dictionary with a default value of an empty hash sprang to mind, but after trying it out I realised that this only works for one level. Recursion was evidently required.

So, here’s the python solution:

from collections import defaultdict
new_dict = lambda: defaultdict(new_dict)
h = defaultdict(new_dict)

And the ruby, which seems overly noisy:

new_hash = lambda { |hash, key| hash[key] = Hash.new &new_hash }
h = Hash.new(&new_hash)

minor metaprogramming

Can anyone tell me why ruby’s instance_variable_set would possibly require the name of a variable to start with an “@”, rather than simply assuming it? It’s a ruddy instance variable, after all…

I can find no decent alternative to python’s setattr in ruby, which surprises me.

Entirely too much investigation into ruby's match operator

(how could a title like that not excite you! ;P)

So yesterday I had this weird regex issue in ruby. I wanted to get a regular expression containing a given string, but didn’t want to have to manually escape all the special characters. Regexp.escape to the rescue! It escapes all regex metacharacters in any given string, and returns it as a regex. In fact, the docs assure me:

For any string, Regexp.escape(str)=~str will be true.

But not so much in practice?

>> str = "123"            
=> "123"
>> Regexp.escape(str)=~str
TypeError: type mismatch: String given


So, problem one: Regexp.escape is broken. It returns not a Regexp, but a string. Oddly enough it also seems to escape spaces and other innocuous characters, but at least you get the right result if you pump it through Regexp.compile(). However, that wasn’t my only discovery.

I mentioned this to Matt, and he couldn’t make much sense of it either. He noticed that the type error is specific to strings - if you use a number it just returns false:

>> 123 =~ 'foo'
=> false

Seems a bit odd, really. Fixnum doesn’t implement =~, nor does Integer.

So I went doc spelunking. I found implementations for =~ in the following three important classes:

Regexp: regexp =~ str: do a regex match, as you might expect

String: str =~ obj: Call obj =~ str (i.e swap the order of your operands). Not mentioned in the docs but clearly apparent in the source (and experimentation): raise a TypeError if both arguments are strings. Without this, matching one string to another would very quickly run out of stack space.

Object: obj =~ other_obj: return false

So the Regexp implementation is fine. The String implementation is a little odd. I guess it’s there to allow people to write matching statements either way, but it seems like a dangerous (and confusing) habit to condone.

But the Object implementation? Why??? What possible reason could one have for doing a match operation against two objects, neither of which implement any matching behaviour? This has the painful side effect of giving every single object I inspect a “=~” method which does nothing. No wonder Object.new() has over 120 methods on it *.

For comparison, python’s object only has 12 methods / attributes. And they’re all special names, so there’s no pollution of regular names going on.

So there you go, two spoonfuls of broken in the one discovery!

(this is ruby 1.8.6, if that matters)

* I exaggerate, over 120 methods on object are just what you get in a rails app. Vanilla ruby only has 41 by my count. but it’s still completely unnecessary, and adds noise to inflate that number

ruby dataflow library

Pretty cool, I worry that such things can’t easily be done so cleanly in python…

(view link)

ruby - longing for some discipline

More and more, I am wishing that there was some sort of strict mode I could enable in ruby to say “you know what? I’m careful with my code. Please don’t assume things behind my back”. And to be honest, this mode would pretty much be synonymous with “just do what python would do”

By default, python is strict. If you index a dict (hash) with a nonexistant key, you get a keyError. If you don’t want to have to deal with that, you can use the get method and provide a default for if the key doesn’t exist. In ruby, if you want to be strict about anything, you generally have to write your own checks to guard against the core library’s forgivingness. Forgivingness sounds nice at first, but goes completely against the idea of failing fast, and frequently delays the manifestation of bugs, making them that much harder to actually track down.

Two examples that I came across within minutes of each other the other day:

Struct.new(:a,:b,:c).new('a','b')

that should NOT go without an exception

"a|b||c".split("|")
=> ["a", "b", "", "c"]

good…. so now:

"a|b||".split("|")
=> ["a", "b"]

argh! what have you done to my third field?

Ruby class methods

Not a very exciting realisation, but an annoying one:

$ irb
>> class A
>>   def self.meth; puts "class method!"; end
>> end

>> A.meth()
class method!

>> A.new().meth()
NoMethodError: undefined method `meth' for #<A:0x5ad160>
	from (irb):11

ick…

(sadly enough, most of my posts tagged “ruby” would be equally well tagged as “things that suck in ruby”)

Ruby is friggin weird. And a little messed up.

(if you don’t read my blog for the geeky thrill of it, you may want to give this post a miss ;))

Follow my little IRB session, if you will:

>> nil or "val"
=> "val"

>> puts (nil or "val").inspect
=> "val"

>> x = nil || "val"
=> "val"

>> x
=> "val"

>> y = nil or "val"
=> "val"

(wait for it…)

>> y
=> nil


Seriously, ruby. What the crap?


Okay, so I just figured out what’s going on here. “or” works both as a logic operator and a conditional statement. Just like you can do:

x = something_dangerous() rescue "x failed!"

and

puts "x is greater than 10" if x > 10

It would seem you can also do

x = some_value or puts "i guess the assignment didn't evaluate to true"

Meaning that in my example above:

y = nil or "val"

Ruby evaluates it as:

(y = nil) or ("val")

(i.e. in the second set of brackets, y is not actually assigned to anything)

Of course, || is solely a logic operator. Which is why it looks like you get different behaviour when you use || instead of or.

When I found out about ruby supporting both sets of logic operators (&&, ||, !) and (and, or, not), I thought it was dumb, but just a matter of preference which type you prefer.

When I found out that the symbol-based ones bind tighter than the keywords, I winced a little and noted to myself never to rely on that, because it’s neither readable or obvious.

Now that I’ve stumbled upon this latest gem of knowledge, It just makes me cringe…

@ruby: You're Doing it Wrong.

Pay attention to the output types, kids:

>> { "key" => "val" }.reject { false }
=> {"key" => "val"}

looking good so far...

>> { "key" => "val" }.select { true }
=> [["key", "val"]]

eww... what?