piep¶
Bringing the power of python to stream editing¶
piep
(pronounced “pipe”) is a command line utility in the spirit of awk
, sed
, grep
, tr
, cut
, etc. Those tools work really well, but you have to use them a lot to keep the wildly varying syntax and options for each of them fresh in your head. If you already know python syntax, you should find piep
much more natural to use.
It’s released under the GPLv3 licence (see the LICENCE file).
Quickstart¶
piep
usually takes a single argument, the pipeline. This is a series of python expressions, separated by pipes. The important variables to know about are p
(the current line), pp
(the entire input) and sometimes i
(the index of the current line). The result of each expression becomes p
in the next part of the pipeline.
The pipeline is run over the program’s input (stdin
, or the file provided with -i
/--input
).
Here’s a few examples to give you an idea:
$ echo -e "Here are\nsome\nwords for you." | piep 'p.split() | len(p)'
2
1
3
$ echo -e "1\n2\n3\n4\n5\n6" | piep 'int(p) | p % 2 == 0 | p , "is even!"'
2 is even!
4 is even!
6 is even!
Things to note:
The argument to
piep
needs to be surrounded with quotes (otherwise your shell would try and interpret the spaces, pipes, brackets etc). Single quotes are best, to prevent any interference from the shell.
p
is not always a string. In the first example we broke apart each line into a list usingsplit
, and then that list became the next value ofp
. It could instead be written as:$ piep 'len(p.split())'But that gets messy when we get into complicated pipelines (and makes for lots of brackets).
if the output of a pipeline is a list or tuple, it will be joined together and printed. The default join string is ” “, but this can be changed with
--join
.if the result of any linewise expression is a boolean or
None
, it acts as a filter for that line (likegrep
)if the result of any linewise expression is a callable object, it will be passed the current value of
p
to form the new value ofp
. This makes it easy to chain functions by just mentioning them, e.g:$ echo -e '1\n2\n3' | piep 'int | [p, p + 1] | pretty'(here
int
is treated asint(p)
andpretty
is treated aspretty(p)
). If you need to assign a function to the value ofp
without having it invoked, you can so do explicitly:p = str
.
File-mode expressions¶
Most of the expressions you’ll use are linewise (those using only p
and i
). If you use pp
, the operation happens on the entire stream. Note that the stream is read in lazily and cannot be “rewound”, so it should be considered to be an iterator rather than a list. However, it does support some of the same operations:
# `head`
$ piep 'pp[:10]'
# `tail`
$ piep 'pp[-10:]'
# remove leading and trailing lines, then uppercase the rest:
$ piep 'pp[1:-1] | p.upper()'
Warning
Slice syntax is supported, but is destructive and will mutate the pp
iterator, so complex expressions involving slicing or indexing may have surprising results. I’m interested in improving this, but for now if you try anything too fancy with pp
, it may not work as expected.
On the plus side, even slice operations are as lazy as they can be - if your slice only needs to read the first 10 lines in the input, that’s all that will be read. This is extremely useful for testing out commands by limiting them to the first few lines of a big file.
Tip
If you need to treat pp
as a regular (non-destructively-updating) list, you can force it by starting your pipeline with list(pp) | ...
. That way, pp
will be eagerly read in and treated as a list instead of a stream. Obviously, this will have adverse affects on memory usage for large input files.
Note
You’ll get an error if you try to use both file-level objects (like pp
) and line-level objects (like p
) in a single expression. You can still use a mixture of file and line-level expressions, just as long as they are separated by pipes.
Additional file inputs¶
If you use the -f/--file
option, you get additional inputs. You can pass this option multiple times, and each file will be read in as a lazy stream with the same functionality as pp
. These files are available as the files
list. There aren’t many use cases for this yet, but one is iterating over pairs of items (one from stdin, one from a file) in concert:
$ piep --file=input2 'pp.zip(files[0]) | "STDIN:%s\tINPUT:%s" % p'
If you only want to use one additional file, you can use the convenient alias ff
instead of files[0]
to reference it.
Running shell commands¶
piep
has a simple way of running commands on your input: the sh
function. It takes multiple arguments, and each becomes a single argument to the underlying command. This means you do not need to quote spaces or other special shell metacharacters, so there will be no painful surprises there.
$ echo -e "setup.py\nMakefile" | piep 'sh("wc", "-l", p)'
The output of sh
is whatever the command prints.
If you wish to run a command without using the output, you can use the spawn
function instead. This acts just like sh
, except the output is ignored and the expression returns
True
(which will maintain the existing value of p
in a pipeline):
$ ls -1 | piep 'spawn("touch", p) | "Touched: " + p'
If you still want to see the command output printed without it becoming part of the pipeline, you can pass stdout=None
to suppress the default redirection.
If a command fails (when using from either sh
or spawn
), an exception will be raised telling you so:
$ echo -e "setup.py\nMakefile" | piep 'sh("false")'
Command 'false' returned non-zero exit status 1
$ echo $?
1
If you wish to suppress this behaviour, you can do so explicitly:
$ echo -e "setup.py\nMakefile" | piep 'sh("false", check=False) + "line!"'
line!
line!
Or (for sh
only) by coercing it to a boolean - it is assumed that if you use a command as a boolean, you will be managing failures yourself:
$ echo -e "echo ok\nfalse" | piep 'p.split() | sh(*p) or "(failed)"'
ok
(failed)
If you absolutely must use shell syntax, you can pass the keyword argument shell=True
.
Utility methods¶
There are three places where utility methods live in piep: globals, line methods (methods of p
) and stream methods (methods of pp
):
Methods available on p (an input line)¶
-
class
piep.line.
Line
¶ -
basename
(**k)¶ alias for os.path.basename(self)
-
dirname
(**k)¶ alias for os.path.dirname(self)
-
filename
(**k)¶ alias for os.path.basename(self)
-
matches
(pattern, group=0, flags=0)¶ return True or False depending on if the given regex can be found anywhere in the line
-
splitcolon
(**k)¶ split on “:”
-
splitcomma
(**k)¶ split on “,”
-
splitext
(**k)¶ alias for os.path.splitext(self)
-
splitline
(**k)¶ alias for str.splitlines()
-
splitpath
(**k)¶ split on path separator (“/” for unix, “” for windows)
-
splitslash
(**k)¶ split on “/”
-
splittab
(**k)¶ split on “t”
-
stripext
(**k)¶ remove filetype extension if present, including “.”
-
Methods available on pp (the input stream)¶
-
class
piep.sequence.
BaseList
¶ Contains the common methods for
piep.List
andpiep.Stream
-
divide
(pred, keep_header=True)¶ Divide this stream at lines where
pred
returns true. Ifkeep_header
is set toFalse
, lines matchingpred
will not be included in the results.Each group is returned as a
List
of items.
-
filter
(f=None)¶ alias for itertools.ifilter(self, f)
-
flatten
()¶ Combine a sequence of strings (containing newlines) into a sequence of lines.
>>> list(Stream(["a\nb\nc","d\ne"]).flatten()) ['a', 'b', 'c', 'd', 'e']
-
join
(s)¶ alias for
s.join(self)
-
merge
()¶ Combine a sequence of iterables into one sequence.
>>> list(Stream([[1],[2,3],[4,5]]).merge()) [1, 2, 3, 4, 5]
-
reverse
()¶ Return a reversed version of this stream (note: reads entire stream into memory) Alias for
reversed(self)
-
sort
(uniq=False)¶ Return a sorted version of this stream (note: reads entire stream into memory). Alias for
sorted(self)
.When
uniq``=``True
, duplicates are removed from the result.
-
sortby
(fn=None, key=None, attr=None, method=None)¶ Return a sorted version of this stream (note: reads entire stream into memory). One (and only one) of the argument types should be provided as the sort key:
fn
will sort using the return value of callingfn
with each item:fn(item)
key
will sort using the given key of each element:item[key]
attr
will sort using the given attribute of each element:item.attr
method
will sort using the result of calling the given method (with no arguments) on each element:item.method()
-
uniq
(stable=False)¶ Remove duplicates. Note: if
stable
is not given (or isFalse
), the order of the return value will be in an arbitrary order. Ifstable
isTrue
, the order will be maintained and only the first occurance of a duplicate line will be kept. This is not the default because it’s much slower. For sorted unique output, trysort(uniq=True)
-
zip
(*others)¶ Combine this stream with another, yielding sequential pairs from each stream. When one sequence is shorter than the other, it’s padded with
None
elements. Basically,itertools.zip_longest(self, *others)
>>> list(Stream([1,2,3,4]).zip(['one','two','three'])) [(1, 'one'), (2, 'two'), (3, 'three'), (4, None)]
-
Global functions / variables¶
The contents of piep.builtins
is mixed in to the global scope, so all of the following are available unqualified:
Standard modules:
-
piep.builtins.
re
¶
-
piep.builtins.
os
¶
-
piep.builtins.
sys
¶
Aliases:
-
piep.builtins.
path
¶ Alias for the
os.path
module
-
piep.builtins.
Line
¶ Alias for the
piep.Line
class (a subclass ofstr
containing all the same methods asp
does)
-
piep.builtins.
List
¶ Alias for the
piep.List
class (a subclass oflist
containing all the same methods aspp
does frompiep.BaseList
)
-
piep.builtins.
devnull
¶ A readable and writable file pointing to the null device (/dev/null)
Globally-accessible functions:
-
piep.builtins.
ignore
(*a)¶ Ignore all arguments and returns
True
. Occasionally useful to evaluate an expression for its side-effects without making use of its value.
-
piep.builtins.
len
(obj)¶ like the builtin
len
, but works (destructively) on iterators.
-
piep.builtins.
pretty
(obj)¶ return a console-cloured pretty printed version of
obj
(likerepr(obj)
, but coloured)
-
piep.builtins.
sh
(*args, **kwargs)¶ Invoke a shell program and return its output.
*args
will be the program name + arguments, and**kwargs
will be passed through tosubprocess.Popen
.One additional keyword argument is supported - the
check
keyword argument (used to suppress an exception when the command fails).For more info, see Running shell commands
-
piep.builtins.
spawn
(*a, **k)¶ acts exactly like
sh
, except that it just returnsTrue
.For more info, see Running shell commands
-
piep.builtins.
str
(obj)¶ like the builtin
str
, but returns an instance of thepiep.Line
subclass.
Re-aligning input¶
When an expression based on one input line generates multiple lines (or a sequence), future expressions will use that multi-line string or sequence as the new value of p
. If you want to roll up a sequence back into pp
, use pp.merge()
. To flatten a multi-line string, use pp.flatten()
.
Take this example:
$ echo -e "2\n4" | piep 'int(p) | range(0, p) | repr(p)'
[0, 1]
[0, 1, 2, 3]
If you wanted each number to come on its own line (for formatting’s sake, or for further processing), you can use merge
:
$ echo -e "2\n4" | piep 'int(p) | range(0, p) | pp.merge()'
0
1
0
1
2
3
The same can be done for multi-line strings, with flatten
:
$ echo "/bin" | piep 'sh("ls", p) | pp.flatten() | pp[:5] | "#", p'
# bash
# bunzip2
# busybox
# bzcat
# bzcmp
Without the flatten, you would instead see output like:
$ echo "/bin" | piep 'sh("ls", p) | pp[:5] | "#", p'
# bash
bunzip2
busybox
bzcat
bzcmp
bzdiff
bzegrep
bzexe
bzfgrep
bzgrep
bzip2
( ... )
History / Variable Assignments¶
It can be useful to reference an earlier result in the pipeline. The only non-expression allowed is a single assignment, which will capture the value of the line at that point in the pipeline. For example:
$ echo -e "a.py\nb.py\nc.py" | piep 'orig = p | p.extonly() | orig, "is a", p, "file"'
a.py is a py file
b.py is a py file
c.py is a py file
Note that you could accomplish the same by capturing some variant of p
without changing it, like so:
$ echo -e "a.py\nb.py\nc.py" | piep 'ext = p.extonly() | p, "is a", ext, "file"'
Note that any file-mode expressions (those mentioning pp
) will cause previously-bound variables to go out of scope, since it would be very hard to correlate these values (and I don’t really see a use for this). Typically, you’ll want to modify pp
before you start the line-wise expressions so it shouldn’t often be a problem in practice.
Extensibility¶
piep
is extensible - it’s just python. You can use the -m
/--import
flag to make modules available, or pass more complicated expressions to --eval
. Future work will allow you to write simple plugins that extend piep
.
Changes¶
- 0.9:
- python3 support
- 0.8:
- bug fixes, particularly parsing edge cases (thanks Matt Giuca)
- add
spawn()
,devnull
andignore()
builtins - auto-invocation of pipeline expressions that return functions - i.g.
str
now evaluates tostr(p)
- 0.7:
- added
Line.reverse()
- add a bunch of shell coersion operators
- add
--no-input
(self-constrcting pipeline) mode - add
--print0
mode to separate output records with the null byte
- added
- 0.6:
- cleaned up & documented
--file
functionality - fix incorrect
repr
for shell results - add the ability to explicitly check the result of shell commands, even when they are coerced into a boolean
- add
pp.sort
,pp.sortby
,pp.uniq
- cleaned up & documented