GFX::Monk Home

Posts tagged: "linux" - page 2

Passing arrays as arguments in bash

I tend to avoid bash wherever possible for scripting, because it has dangerously bad defaults and will happily munge your data unless you take great care to wrap it up properly (particularly whitespace). But sometimes you have no choice, so you might as well know how to do it safely.

Here’s how to capture argv as a bash array, and pass it on to another command without breaking if some argument contains a space:

echo "${args[@]}"

You can also just pass “$@” directly, but the above syntax works for any array.

Don’t forget any of those quotes, or bash will silently ruin everything (until you have data with spaces, at which point it might loudy ruin everything).

Here’s how to convert a line-delimited string (e.g a list of files in the current directory) into an array and pass that on:

mapfile -t arr <<<"$(ls -1)"
echo "${arr[@]}"

Note that a sensible-looking:

ls -1 | mapfile -t args

will not work, as a builtin on the receiving end of a pipe gets run in a subshell.

If you don’t have mapfile (added in bash v4), you’ll have to resort to:

oldIFS="$IFS"; IFS=$'\n' read -d '' -r -a arr <<< "$(ls -1)"; IFS="$oldIFS"; unset oldIFS
echo "${arr[@]}";

I look forward to the day when I don’t have to know that.

pkcon, the cross-distro package manager CLI

I seem to keep switching between Ubuntu and Fedora every 6 months, for various reasons. I think switching is good for you in that it allows you to evaluate the benefits of each distro, and also encourages you “travel light” as it were, discouraging one-off hacks that you’ll have to figure out each and every time you change distro (or potentially upgrade). That’s not to say I don’t do a heck-tonne of customisation, but I do it in a structured, repeatable and (when possible) cross-distro manner.

To be honest, the main thing that irks me is having to remember to mentally switch between apt-get/dpkg and yum/rpm. I try to declare package dependencies for tools I commonly use or libraries I depend on in a cross-platform way using zero-install feeds (this really helps when switching, since I don’t have to remember the differing names of many packages), but inevitably I still need to use the command-line package managers fairly often.

Today I ran across fpm (“Effing Package Management”) again, which is a tool that a (likely frustrated) user has built to abstract away the uninteresting details of building packages for such-and-such package manager. It turns out that system packages are boring, and do pretty much the same thing in 99% percent of cases. The other 1% is probably a bad idea, or obscure enough of a feature that you’ll do it wrong1. I took a closer look to see if it also dealt with the uninteresting inconsistencies of using a package manager to search for an install packages, but no such luck.

Enter pkcon

Fortunately, a pretty decent tool is already available, and already installed in many modern distros. It’s called pkcon, and it’s part of the PackageKit project to provide a consistent API for the common operations across the plethora of linux package managers. PackageKit has a whole heap of supported backends, most likely more than you’ve even heard of. It covers the basic operations that are generally present in all package managers, it does not attempt to expose advanced features. Fedora’s default package GUI is built on it, and I believe ubuntu’s software center either is already or soon will be built on it it too. It’s also the mechanism by which zero-install can install required system packages as needed on most distros, which earns it a special place in my heart ;)

Using it

Anyway, what can you do with pkcon? Close to everything I’ve ever needed to. Technically speaking it’s largely a testing tool for exercising the PackageKit API, but (by no coincidence) the PackageKit API happens to be a straightforward set of the main things a user actually cares about. For the most part, the commands are exactly what you’d expect if you’ve used apt or yum, and in other cases they make more sense. Here’s a list of the stuff I use, with apt and yum equivalents for context:

install (apt-get install, yum install)

$ pkcon install [packagename]

remove (apt-get remove, yum remove)

$ pkcon remove [packagename]

check for updates (apt-get update, yum check-update)

$ pkcon refresh

upgrade all packages (apt-get upgrade, yum update)

$ pkcon update
$ pkcon search [name|details|group|file] [query]

file is pretty useful when you want to find “who owns that file on my hard drive”, but unfortunately you can’t use wildcards to find files you’re missing but don’t know the full location of (at least it’s not working for me). Still, good for satisfying poorly-documented build dependencies that are expecting certain files to exist.

what-provides (rpm -qf, dpkg -S)

$ pkcon what-provides /usr/bin/python

Seems to be the same as pkcon search file above. Much easier to remember than the rpm/dpkg flags.

get-details (apt-cache show, yum info)

$ pkcon get-details [packagename]

I mainly use this for getting the version of a package installed, or to check that a given package is what I want to install.

get-files (dpkg -L, rpm -ql)

$ pkcon get-files python

This one even works on packages you haven’t installed, unlike both dpkg and rpm.

User Beware

Unfortunately, pkcon is not without its rough edges. While the package management actions are well-tested, the command line itself is a bit flaky. For example, both --help-all and refresh --force give parse errors on Fedora 17, despite being recommended in the help text. More importantly the --filter=installed flag does nothing on my system. Without that filter, you will need to disambiguate (by picking from a numbered list) whenever you perform an action on a package that is installed but also has an available update, or which is available in both 32 and 64 bit versions.

It also requires DBus as a dependency (since PackageKit is implemented as a DBus API), which may make it a poor option for administering servers. But you probably shouldn’t be doing most of this manually on a server anyway.

  1. Even relatively common features like debian’s postrm scripts are frequently used incorrectly to painful effect. I’ve seen multiple packages that forever have to workaround dumb bugs in the control scripts of earlier version of the package, because the author didn’t fully understand how they worked (full disclosure: I wrote one such package ;)). The reason you need to work around this forever is that you never know if the previous version a user had installed was one of the buggy versions (since it’s the outgoing version whose postrm script gets run). 

My new bash script prelude

For a while my preferred “bash script prelude” (the stuff you write in every bash script, before the actual content) has been a modest:

set -eux

Those settings break down to:

  • -e: fail when any subcommand fails (the number of scripts that proceed to do completely incorrect things after an earlier command failed is criminal). You can always silence errors by using || or an if statement, but (much like exceptions in a high-level language) silence should not be the default.
  • -u: fail when an unknown variable is referenced - mostly guards against silly typos, but why wouldn’t you? It’s sometimes not what you want with variables that may be optionally set by the user, so sometimes I’ll move the set -u down in the script after I’ve dealt with setting those to their defaults if they are not present.
  • -x: trace all commands, after expansion. E.g with a line like: x=123; echo $x, Your output will be:

    • x=123
    • echo 123 123

This gets very noisy, but frequently I find it’s better to have more output than less - when a script breaks, often you can tell why just by looking at the last command trace (especially when combined with the -e option). And it’s easier to ignore excess output than it is to guess what went wrong.

The new hotness

Today, I discovered / created a much longer (and uglier) prelude, but I think it’s going to be worth it. Here ‘tis:

set -eu
set -o pipefail
export PS4='+ ${FUNCNAME[0]:+${FUNCNAME[0]}():}line ${LINENO}: '
syslogname="$(basename "$0")[$$]"
exec 3<> >(logger -t "$syslogname")
echo "Tracing to syslog as $syslogname"
unset syslogname
debug() { echo "$@" >&3; }
set -x

A mouthful, indeed. Lets go though it line-by-line:

Systemd Socket Activation in Python

For those unaware: systemd is a replacement for the traditional unix init process. It is the first process to be brought up by the kernel, and is responsible for all the user-space tasks in booting a system. Conveniently, it also has a --user mode switch that allows you to use it yourself as a session-level init service. As someone who hates being root and loves repeatable configuration, I’m tremendously pleased to be able to offload as much of my session management as I can to something that’s built for the task (and doesn’t rely on hard-coded, root-owned paths).

I’ve heard a lot of complaining from sysadmins who seem to prefer a tangled mess of shell scripts in /etc/init.d/, but I’m a big fan of systemd for its user-friendly features - like restarting stuff when it dies, listing running services, showing me logging output, and being able to reliably kill daemons. There are other ways to accomplish all of these, but systemd is easy to use, and works very well in my experience.

Socket activation

Recently I was wondering how to implement systemd socket activation in python. The idea is much like inetd, in which you tell systemd which port to listen on, and it opens up the port for you. When a request comes in, it immediately spawns your server and hands over the open socket to seamlessly allow your program to service the request. That way, services can be started only as necessary, making for a quicker boot and less resource usage for infrequently accessed services.

The question is, of course, how am I supposed to grab a socket from systemd? It turns out it’s not hard, but there are some tricks. For my app, I am using the BaseHTTPServer module. Specifically, the threaded variant that you can construct using multiple inheritance:

class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):

I couldn’t find much existing information on how to do this in python, so I thought I’d write up what I found. One useful resource was this tutorial for ruby, since python and ruby are pretty similar.

Nuts and bolts

Obviously, systemd needs to tell you that it has a socket open for you. The way it does this is by placing your process ID into $LISTEN_PID. So to tell if you should try and use an existing socket, you can check:

if os.environ.get('LISTEN_PID', None) == str(os.getpid()):
	# inherit the socket
	# start the server normally

Given that you normally pass a host and port into the HTTPServer class’ constructor, how can you make it bind to a socket systemd gives you? It turns out to be fairly simple:

class SocketInheritingHTTPServer(ThreadedHTTPServer):
	"""A HttpServer subclass that takes over an inherited socket from systemd"""
	def __init__(self, address_info, handler, fd, bind_and_activate=True):
		ThreadedHTTPServer.__init__(self, address_info, handler, bind_and_activate=False)
		self.socket = socket.fromfd(fd, self.address_family, self.socket_type)
		if bind_and_activate:
			# NOTE: systemd provides ready-bound sockets, so we only need to activate:

You construct this just like you would the normal ThreadedHTTPServer, but with an extra fd keyword argument. It passes bind_and_activate=False to prevent the parent class from binding the socket, overrides the instance’s self.socket, and then activates the server.

The final piece of the puzzle is the somewhat-arbitrary knowledge that systemd passes you in sockets beginning at file descriptor #3. So you can just pass in fd=3 to the SocketInheritingHTTPServer. If you have a server that has multiple ports configured in your .socket file, you can check $LISTEN_FDS

And that’s it! I’ve only just learnt this myself, so I may be missing some detail. But it seems to work just fine. If you want to see the full details, you can have a look at the commit I just made to edit-server, which includes a simple script to simulate what systemd does when it hands you an open socket, for quick testing. You can also have a look at the service files to see how the .socket and .service systemd units are set up.

Shellshape arrives on

shellshape logo

It’s been a long time coming, but shellshape (my tiling window manager extension for gnome-shell) is finally available via Get it while it’s hot!

You’ll need the bleeding edge (3.4.1) version of gnome-shell, as it’s the first version that allows normal extensions to register new keybindings. If you’re stuck on 3.4 you can still use the 0launch method described on the shellshape homepage.

(view link)

Why Piep

piep (pronounced “pipe”) is a new command line tool for processing text streams with a slightly modified python syntax, inspired by the main characters of a typical unix shell (grep, sed, cut, tr, etc). To pluck a random example. here’s how you might rename all files (but not directories) in the current folder to have a “.bak” extension (because you have a very strange and manual backup scheme, apparently):

$ ls -1 | piep 'not os.path.isdir(p) | sh("mv", p, p + ".bak")'

In this simple example we can see filtering, piping (note that the pipes between expressions are internal to piep’s single argument, and thus not interpreted by the shell), and shelling out to perform useful work.

Here’s another, to print out the size of files in the current directory that are greater than 1024 bytes:

$ ls -l | piep 'pp[1:] | p.splitre(" +", 7) | size=int(p[4]) | size > 1024 | p[7], "is", p[4], "bytes"'

Or, if hacking through the output of ls -l isn’t your thing (it’s most likely a terrible idea), you can do things the pythonic way:

$ ls -1 | piep --import 'stat' 'size=os.stat(p).st_size | size > 1024 | p, "is", size, "bytes"'

For a proper introduction, you should read the online documentation. But I wanted to address one specific point here, about the origins of piep.

Recently I came across pyp, The Pied Piper. It seemed like a great idea, but after I played with it for a little while I uncovered some unfortunate shortcomings, some of which are deal breakers. My list included:

  • stream-based operation: there’s a beta implementation with “turbo” (line-wise) mode, but it seems very limited. I believe it should be the norm, and wanted to see if I could do things in a way that was just as convenient, but with all the benefits of lazy stream-based processing.
  • Command execution: commands are made up by string concatenation, requiring manual effort to escape metacharacters including the humble space 1. Also, errors are silently ignored.
  • Purity of data: things are routinely strip()ed and empty strings are frequently dropped from computations. Breaking up a line into a list of data would (sometimes?) see each list merged back into the input stream, rather than maintained as a list.
  • stream confusion: second stream, file inputs, etc. Not really sure why there are so many special cases
  • a not-very-extensible extension mechanism, which is fairly manual and appears to preclude sharing or combining extensions
  • lots of unnecessary machinery that complicates the code: macros, history, –rerun, three file input types, etc. Some of this may be useful once you use the tool a lot, but it hindered my ability to add the features I wanted to pyp.

I initially tried my hand at modifying pyp to fix some of the things I didn’t like about it, but the last point there really got in my way. History is baked in, and doesn’t really work in the same manner for stream-based operations. The entire pp class had to be rewritten, which is actually what I started doing when I decided to turn it into a separate tool (since it then became difficult to integrate this new pp class with the rest of the system. Anyway, I hope this isn’t taken as an offence by the developers of pyp - I really like the ideas, so much so that I was compelled to write my ideal version of them.

  1. I humbly submit that concatenating strings is the worst possible way to generate shell commands, leading to countless dumb bugs that only rear their heads in certain situations (and often in cascading failures). Observe piep’s method on a filename containing spaces:

    $ ls -1 | piep 'sh("wc", "-c", p)'
    82685610 Getting the Most Out of Python Imports.mp4

    Compared to that of pyp (and countless other tools):

    $ ls -1 | pyp 'shell("wc -c " + p)'
    wc: Getting: No such file or directory
    wc: the: No such file or directory
    wc: Most: No such file or directory
    wc: Out: No such file or directory
    wc: of: No such file or directory
    wc: Python: No such file or directory
    wc: Imports.mp4: No such file or directory
    [[0]0 total]
    $ echo $?

    It is unacceptable for a language with simple and convenient sequence types to instead rely on complex string escaping rules to prevent data from being misinterpreted. To be honest, this on its own may be reason enough to use piep over alternatives.