GFX::Monk Home

- page 7

Why Piep

piep (pronounced “pipe”) is a new command line tool for processing text streams with a slightly modified python syntax, inspired by the main characters of a typical unix shell (grep, sed, cut, tr, etc). To pluck a random example. here’s how you might rename all files (but not directories) in the current folder to have a “.bak” extension (because you have a very strange and manual backup scheme, apparently):

$ ls -1 | piep 'not os.path.isdir(p) | sh("mv", p, p + ".bak")'

In this simple example we can see filtering, piping (note that the pipes between expressions are internal to piep’s single argument, and thus not interpreted by the shell), and shelling out to perform useful work.

Here’s another, to print out the size of files in the current directory that are greater than 1024 bytes:

$ ls -l | piep 'pp[1:] | p.splitre(" +", 7) | size=int(p[4]) | size > 1024 | p[7], "is", p[4], "bytes"'

Or, if hacking through the output of ls -l isn’t your thing (it’s most likely a terrible idea), you can do things the pythonic way:

$ ls -1 | piep --import 'stat' 'size=os.stat(p).st_size | size > 1024 | p, "is", size, "bytes"'

For a proper introduction, you should read the online documentation. But I wanted to address one specific point here, about the origins of piep.


Recently I came across pyp, The Pied Piper. It seemed like a great idea, but after I played with it for a little while I uncovered some unfortunate shortcomings, some of which are deal breakers. My list included:

  • stream-based operation: there’s a beta implementation with “turbo” (line-wise) mode, but it seems very limited. I believe it should be the norm, and wanted to see if I could do things in a way that was just as convenient, but with all the benefits of lazy stream-based processing.
  • Command execution: commands are made up by string concatenation, requiring manual effort to escape metacharacters including the humble space 1. Also, errors are silently ignored.
  • Purity of data: things are routinely strip()ed and empty strings are frequently dropped from computations. Breaking up a line into a list of data would (sometimes?) see each list merged back into the input stream, rather than maintained as a list.
  • stream confusion: second stream, file inputs, etc. Not really sure why there are so many special cases
  • a not-very-extensible extension mechanism, which is fairly manual and appears to preclude sharing or combining extensions
  • lots of unnecessary machinery that complicates the code: macros, history, –rerun, three file input types, etc. Some of this may be useful once you use the tool a lot, but it hindered my ability to add the features I wanted to pyp.

I initially tried my hand at modifying pyp to fix some of the things I didn’t like about it, but the last point there really got in my way. History is baked in, and doesn’t really work in the same manner for stream-based operations. The entire pp class had to be rewritten, which is actually what I started doing when I decided to turn it into a separate tool (since it then became difficult to integrate this new pp class with the rest of the system. Anyway, I hope this isn’t taken as an offence by the developers of pyp - I really like the ideas, so much so that I was compelled to write my ideal version of them.

  1. I humbly submit that concatenating strings is the worst possible way to generate shell commands, leading to countless dumb bugs that only rear their heads in certain situations (and often in cascading failures). Observe piep’s method on a filename containing spaces:

    $ ls -1 | piep 'sh("wc", "-c", p)'
    82685610 Getting the Most Out of Python Imports.mp4
    

    Compared to that of pyp (and countless other tools):

    $ ls -1 | pyp 'shell("wc -c " + p)'
    wc: Getting: No such file or directory
    wc: the: No such file or directory
    wc: Most: No such file or directory
    wc: Out: No such file or directory
    wc: of: No such file or directory
    wc: Python: No such file or directory
    wc: Imports.mp4: No such file or directory
    [[0]0 total]
    $ echo $?
    0
    

    It is unacceptable for a language with simple and convenient sequence types to instead rely on complex string escaping rules to prevent data from being misinterpreted. To be honest, this on its own may be reason enough to use piep over alternatives. 

Looking for a good javascript mocking library

Lately I’ve been looking for a good mocking library for node.js. It’s not easy.

Here are some (I would have said obvious) features that seem to be missing in most of the libraries I’ve seen:

  1. create an anonymous mock object on which to add expected methods (no need to provide a template object)
  2. create a (named) mock function (i.e a directly callable mock)
  3. argument matchers (at least eq(), given how terrible javascript equality is)
  4. stub a single method of a real object for the duration of the test
  5. verify all expectations and revert all replaced methods (see #4) with a single call (to be called from a common tearDown())

I don’t want it to be tied to my test runner, I’m quite happy with mocha.

I prefer the rspec style of setting expectations on mocks before the code is run and having them verified at the end of the test, but it’s not a requirement.

I plan need it to run in node.js, but would like it to work in the browser (even if I have to use some sort of commonJS-shim).

Here are the libraries I tried to use or looked at, and reasons they will not suffice:

Honestly, I looked at most of the unit testing modules listed on the node.js wiki that sounded like they did mocking.

jasmine’s mocking support seems somewhat reasonable (I’ve used it before), but unfortunately it seems to be tied to the jasmine test runner, which is not acceptable for async tests.

I’m happy to be shown wrong about my conclusions here, or to be pointed to any mocking library that succeeds in most (or at least more) of my requirements. If all else fails I may consider porting my mocktest python library to javascript as best as the language will allow, but it’s probably a lot of effort - surely someone has written a good javascript mocking library somewhere amongst all this? What do other folks use?

Cool projects I learnt about at lca2012

I went to linux.conf.au this week, and learnt about some pretty awesome tech (as well as hearing some entertaining, inspiring, touching and terrifying talks from the likes of Paul Fenwick, Bruce Perens, Karen Sandler and Jacob Appelbaum).

So here’s a quick dump of cool projects I learnt about, with links where I could find them.

browser id:

Openid is not so great, due to:

  • usability / confusion (requiring a url as an id)
  • reliability (if your provider goes down, you can’t log in)
  • lock in, (hard for the user to migrate providers)
  • privacy (your provider knows every url you log into, every time)

Browserid serves one simple purpose: to prove you own an email. It’s distributed. The browser, (not a third-party server) is the login intermediary.

As an example: The login process to gmail generates a shortlived (in the order of hours or maybe days, I guess) cryptographically signed statement that you own that email, which your browser stores. Other sites just need to grab gmail’s public key and then they can themselves verify that the assertion you sent them proves you own (or can log in to) you@example.com.

Demo site: myfavoritebeer.org

Not awesome yet: There’s no browser or email provider support. But it’s developed by mozilla, has scaffolding in js to work already in all browsers. Certs are stored on browserid.org for now, until browsers implement native support (so they are just as bad as a server intermediary, but only as a stopgap).

mediastreams processing api:

Very cool demos, which are online (although they require a dev build of firefox).

css calc()

Computation in css. Very cool. There are implementations in IE and Firefox so far, and work on webkit is in progress. This is not just a convenience like SASS and friends - it allows for previously impossible mixing of units like 100% - 10px.

News on arbitrary metadata in git:

Apparently many people feel this is important for VCS interop (git-svn, fastimport, etc), and there are proposals for adding support (but it’s difficult to figure out how to do right in a way that won’t prohibit other useful things in the future).

openstack

Uses orchestra by ubuntu for deployments.

gerrit for code review. Use Jenkins plugins, pushes state notifications to launchpad. openstack wrote a patch submission tool: git-review. Rebases & pushes patchset to gerrit.

testing ctdb

Autocluster is a tool for testing clustered samba (and other things, presumably). Virtual kvm based clusters. Uses guestfish to manage volumes inside the kvm guests.

dbench for automated performance tests. Can describe any kind of IO workload.

Assessing the potential impact of package updates

Tools for spelunking package relationships (on debian at least):

  • Recursive depends, reverse depends, apt-cache, germinate.

misc projects / pointers

sozi is an inkscape plugin for chopping up vectors into slide presentations.

fossil scm is a VCS that tracks docs, wikis, bugs, etc in with the source code. Cool idea, apparently some bad implementation decisions though.

safe-rm: replaces rm and has a blacklist of folders you probably don’t really want to remove, such as /usr/. No more bumblebeeing (can’t find the link, but there was once a bug in the bumblebee uninstall script that removed all of /usr as root).

handbag makes android accessory dev easy(er)

libvirt-sandbox: new library and command-line tools for app-level sandboxing with LCX and/or KVM. Should be coming in Fedora 17.

Vsualisation: gapminder.org

freedombox: 100% free tiny personal server project. Uses all open-source & federated social software like diaspora, buddycloud

instamorph: malleable plastic that’s solid at room temperature, for making ad-hoc hooks, connectors, docks, etc. The aussie equivalent is apparently called “Polymorph”.

Mentally ignoring files in git

For unfortunate reasons, at work we have intellij module files that are:

  1. checked into git (because they are useful and hard to recreate)
  2. changing all the time for no good reason (because intellij is a bit like that)

For the most part, the changes are just noise because paths, versions, etc. differ slightly across dev machines. Sometimes the differences are completely meaningless (unordered elements in a file being rearranged). But sometimes we do want to check in new versions of these files, say when we add a new module.

Since they are checked in, we can’t just add them to .gitignore. But we can still mentally ignore them, with this rather ridiculous hack:

$ git status | sed -E -e 's/\x1b\[[0-9][12]m(.*\.iml)/'"`tput setaf 3`"'\1/'

This changes the normally red (or green, or blue) lines in the git status output to yellow instead when they refer to an intellij .iml file. It’s completely hacky, but it’s better than breaking the build because you forgot to commit a file amongst all the noise.

You can put this in a script you call instead of git status - writing git status takes too long anyway, I just call my script g.

You’ll also need to make sure you have enabled colour “always” in git status output, otherwise you’ll get no colours at all:

$ git config --global color.status always

If anyone has a better way of ignoring files while having them checked in, I’d love to hear it. Do other VCSs allow you to deal with this any better?

Obligate.js

I’ve been doing some browser-side javascript lately, and getting frustrated at the mess that is browser-side modules.. So here’s a tiny library that, given a tree containing javascript files, will give you a single javascript file containing all of the modules, as well as a commonJS require method to use in browser-side code. No more accidental globals, your variables are local and you just add your module’s public interface to properties on the module-local exports object.

Of course, it also includes a tool to grab all the javascript code of your own and that of your recursive dependencies, specified in a zero install feed.

(view link)

Ruby's split() function makes me feel special (in a bad way)

Quick hand count: who knows what String.split() does?

Most developers probably do. Python? easy. Javascript? probably. But if you’re a ruby developer, chances are close to nil. I’m not trying to imply anything about the intelligence or skill of ruby developers, it’s just that the odds are stacked against you.


So, what does String.split() do?

In the simple case, it takes a separator string. It returns an array of substrings, split on the given string. Like so:

py> "one|two|three".split("|")
["one", "two", "three"]

Simple enough. As an extension, some languages allow you to pass in a num_splits option. In python, it splits only this many times, like so:

py> "one|two|three".split("|", 1)
["one", "two|three"]

Ruby is similar, although you have to add one to the second argument (it talks about number of returned components, rather than number of splits performed).

Javascript is a bit odd, in that it will ignore the rest of the string if you limit it:

js> "one|two|three".split("|", 2)
["one", "two"]

I don’t like the javascript way, but these are all valid interpretations of split. So far. And that’s pretty much all you have to know for python and javascript. But ruby? Pull up a seat.