GFX::Monk Home

Posts tagged: "python"

Systemd Socket Activation in Python

For those unaware: systemd is a replacement for the traditional unix init process. It is the first process to be brought up by the kernel, and is responsible for all the user-space tasks in booting a system. Conveniently, it also has a --user mode switch that allows you to use it yourself as a session-level init service. As someone who hates being root and loves repeatable configuration, I’m tremendously pleased to be able to offload as much of my session management as I can to something that’s built for the task (and doesn’t rely on hard-coded, root-owned paths).

I’ve heard a lot of complaining from sysadmins who seem to prefer a tangled mess of shell scripts in /etc/init.d/, but I’m a big fan of systemd for its user-friendly features - like restarting stuff when it dies, listing running services, showing me logging output, and being able to reliably kill daemons. There are other ways to accomplish all of these, but systemd is easy to use, and works very well in my experience.

Socket activation

Recently I was wondering how to implement systemd socket activation in python. The idea is much like inetd, in which you tell systemd which port to listen on, and it opens up the port for you. When a request comes in, it immediately spawns your server and hands over the open socket to seamlessly allow your program to service the request. That way, services can be started only as necessary, making for a quicker boot and less resource usage for infrequently accessed services.

The question is, of course, how am I supposed to grab a socket from systemd? It turns out it’s not hard, but there are some tricks. For my app, I am using the BaseHTTPServer module. Specifically, the threaded variant that you can construct using multiple inheritance:

class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
	pass

I couldn’t find much existing information on how to do this in python, so I thought I’d write up what I found. One useful resource was this tutorial for ruby, since python and ruby are pretty similar.

Nuts and bolts

Obviously, systemd needs to tell you that it has a socket open for you. The way it does this is by placing your process ID into $LISTEN_PID. So to tell if you should try and use an existing socket, you can check:

if os.environ.get('LISTEN_PID', None) == str(os.getpid()):
	# inherit the socket
else:
	# start the server normally

Given that you normally pass a host and port into the HTTPServer class’ constructor, how can you make it bind to a socket systemd gives you? It turns out to be fairly simple:

class SocketInheritingHTTPServer(ThreadedHTTPServer):
	"""A HttpServer subclass that takes over an inherited socket from systemd"""
	def __init__(self, address_info, handler, fd, bind_and_activate=True):
		ThreadedHTTPServer.__init__(self, address_info, handler, bind_and_activate=False)
		self.socket = socket.fromfd(fd, self.address_family, self.socket_type)
		if bind_and_activate:
			# NOTE: systemd provides ready-bound sockets, so we only need to activate:
			self.server_activate()

You construct this just like you would the normal ThreadedHTTPServer, but with an extra fd keyword argument. It passes bind_and_activate=False to prevent the parent class from binding the socket, overrides the instance’s self.socket, and then activates the server.

The final piece of the puzzle is the somewhat-arbitrary knowledge that systemd passes you in sockets beginning at file descriptor #3. So you can just pass in fd=3 to the SocketInheritingHTTPServer. If you have a server that has multiple ports configured in your .socket file, you can check $LISTEN_FDS

And that’s it! I’ve only just learnt this myself, so I may be missing some detail. But it seems to work just fine. If you want to see the full details, you can have a look at the commit I just made to edit-server, which includes a simple script to simulate what systemd does when it hands you an open socket, for quick testing. You can also have a look at the service files to see how the .socket and .service systemd units are set up.

IPython's new notebook feature

I’ve just finished watching the Pycon-2012 talk “IPython: Python at your fingertips”, and am really impressed by the new notebook feature, new in version 0.12.

I’ve used IPython over python’s standard console since I first learned about it, and think it’s the best REPL around for any language. So I didn’t think I’d learn too much from this talk, but it turns out IPython is even cooler than I thought!

I’m not sure how much I’d use it during development (although it has some neat advantages over the IPython console), but I can see this making for some killer interactive documentation - publish the pre-processed results on the web, with a “play with this example” button allowing the user to grab the script, interactively modify any part of the code they want and see the new results.

It could also be really useful for manual / visual inspection of test scenarios that are too hard, expensive or brittle to completely automate.

Update: OK, I’ve finally got it running1. I take it back, this is awesome for interactive development as well.

  1. I had to make and compile 0install feeds for tornado and ipython since ubuntu’s version was too old, if you want to run it yourself without having to do this you should be able to run 0launch --command=notebook2 http://gfxmonk.net/dist/0install/ipython.xml   

Why Piep

piep (pronounced “pipe”) is a new command line tool for processing text streams with a slightly modified python syntax, inspired by the main characters of a typical unix shell (grep, sed, cut, tr, etc). To pluck a random example. here’s how you might rename all files (but not directories) in the current folder to have a “.bak” extension (because you have a very strange and manual backup scheme, apparently):

$ ls -1 | piep 'not os.path.isdir(p) | sh("mv", p, p + ".bak")'

In this simple example we can see filtering, piping (note that the pipes between expressions are internal to piep’s single argument, and thus not interpreted by the shell), and shelling out to perform useful work.

Here’s another, to print out the size of files in the current directory that are greater than 1024 bytes:

$ ls -l | piep 'pp[1:] | p.splitre(" +", 7) | size=int(p[4]) | size > 1024 | p[7], "is", p[4], "bytes"'

Or, if hacking through the output of ls -l isn’t your thing (it’s most likely a terrible idea), you can do things the pythonic way:

$ ls -1 | piep --import 'stat' 'size=os.stat(p).st_size | size > 1024 | p, "is", size, "bytes"'

For a proper introduction, you should read the online documentation. But I wanted to address one specific point here, about the origins of piep.


Recently I came across pyp, The Pied Piper. It seemed like a great idea, but after I played with it for a little while I uncovered some unfortunate shortcomings, some of which are deal breakers. My list included:

  • stream-based operation: there’s a beta implementation with “turbo” (line-wise) mode, but it seems very limited. I believe it should be the norm, and wanted to see if I could do things in a way that was just as convenient, but with all the benefits of lazy stream-based processing.
  • Command execution: commands are made up by string concatenation, requiring manual effort to escape metacharacters including the humble space 1. Also, errors are silently ignored.
  • Purity of data: things are routinely strip()ed and empty strings are frequently dropped from computations. Breaking up a line into a list of data would (sometimes?) see each list merged back into the input stream, rather than maintained as a list.
  • stream confusion: second stream, file inputs, etc. Not really sure why there are so many special cases
  • a not-very-extensible extension mechanism, which is fairly manual and appears to preclude sharing or combining extensions
  • lots of unnecessary machinery that complicates the code: macros, history, –rerun, three file input types, etc. Some of this may be useful once you use the tool a lot, but it hindered my ability to add the features I wanted to pyp.

I initially tried my hand at modifying pyp to fix some of the things I didn’t like about it, but the last point there really got in my way. History is baked in, and doesn’t really work in the same manner for stream-based operations. The entire pp class had to be rewritten, which is actually what I started doing when I decided to turn it into a separate tool (since it then became difficult to integrate this new pp class with the rest of the system. Anyway, I hope this isn’t taken as an offence by the developers of pyp - I really like the ideas, so much so that I was compelled to write my ideal version of them.

  1. I humbly submit that concatenating strings is the worst possible way to generate shell commands, leading to countless dumb bugs that only rear their heads in certain situations (and often in cascading failures). Observe piep’s method on a filename containing spaces:

    $ ls -1 | piep 'sh("wc", "-c", p)'
    82685610 Getting the Most Out of Python Imports.mp4
    

    Compared to that of pyp (and countless other tools):

    $ ls -1 | pyp 'shell("wc -c " + p)'
    wc: Getting: No such file or directory
    wc: the: No such file or directory
    wc: Most: No such file or directory
    wc: Out: No such file or directory
    wc: of: No such file or directory
    wc: Python: No such file or directory
    wc: Imports.mp4: No such file or directory
    [[0]0 total]
    $ echo $?
    0
    

    It is unacceptable for a language with simple and convenient sequence types to instead rely on complex string escaping rules to prevent data from being misinterpreted. To be honest, this on its own may be reason enough to use piep over alternatives. 

Ruby's split() function makes me feel special (in a bad way)

Quick hand count: who knows what String.split() does?

Most developers probably do. Python? easy. Javascript? probably. But if you’re a ruby developer, chances are close to nil. I’m not trying to imply anything about the intelligence or skill of ruby developers, it’s just that the odds are stacked against you.


So, what does String.split() do?

In the simple case, it takes a separator string. It returns an array of substrings, split on the given string. Like so:

py> "one|two|three".split("|")
["one", "two", "three"]

Simple enough. As an extension, some languages allow you to pass in a num_splits option. In python, it splits only this many times, like so:

py> "one|two|three".split("|", 1)
["one", "two|three"]

Ruby is similar, although you have to add one to the second argument (it talks about number of returned components, rather than number of splits performed).

Javascript is a bit odd, in that it will ignore the rest of the string if you limit it:

js> "one|two|three".split("|", 2)
["one", "two"]

I don’t like the javascript way, but these are all valid interpretations of split. So far. And that’s pretty much all you have to know for python and javascript. But ruby? Pull up a seat.

Stereoscoper and the Depth of Awesomeness

A few days ago I got a shiny new toy: a 3d camera from thinkgeek (I’d link to it, but it seems to have disappeared from their catalogue). I’m a massive fan of 3d photos / video, so it’s pretty cool to have a device that allows me to take stereoscopic photo pairs simultaneously (you can do it manually with a static scene, but those get boring).

Sadly (although not surprisingly), the quality is not great. The limited resolution is not really an issue given how you’re likely to view them, but the pictures come out awkwardly stretched to half the expected horizontal resolution. They are also pre-combined in a single JPEG, they are the way around for cross-eyed viewing, and the colour balance is frequently off between the two sensors (which can be really jarring).

Seeing a lot of manual photo fixing in my future, I set out to automate it. And thus stereoscoper was born, as a way to bulk-convert stereo images to other formats. Aside from the obvious geometry changes (the horizontal resolution and image placement), I also learnt all about histogram matching in order to make the colour balance consistent across stereo pairs. And in order to make animated gifs that match up nicely, there’s even an interactive mode where you can fine-tune the alignment of the image pairs.

In the wiggly-animated spirit of 3ERD (note: some images there are NSFW), here’s some fun we had in the park with my new toy:

(click to toggle each animation. It’s off by default to save your brain from having a fit ;)

(stereo)

(stereo)

(stereo)

(stereo)

Update: Click the (stereo) link under each image for a cross-eyed viewing version.

Experiment: Using Zero-Install as a Plugin Manager

So for a little while now I’ve been wanting to try an experiment, using Zero Install as a Plugin Manager. Some background:

  • Zero install is designed to be used for distributing applications and libraries.
  • One of its greatest strengths is that it has no global state, and doesn’t require elevated privileges (i.e root) to run a program (and as the name implies, there is no installation to speak of).

Since there’s no global state, it seems possible to use this as a dependency manager within a program - also known as a plugin system.

Happily, it turns out it’s not so difficult. There are a few pieces required to set it up, but for the most part they are reusable - and much simpler to implement (and more flexible) than most home-rolled plugin systems.