Dealing with non-local control flow in deferred CoffeeScript
…a continuation (hah!) of defer: Taming asynchronous javascript with coffeescript.
In my last post, I outlined the ideas behind defer
and its current state. One of the things I mentioned is that how to deal with return
statements in asynchronous code is not yet decided. I’d like to explore a few of those ideas here. If you haven’t read the previous post, I suggest that you do.
The problem
With real continuations, a continuation object represents the rest of a program’s execution. With defer
, the compiler can only generate a continuation that includes the rest of the current function. This works just as well, but only as long as the function calls a callback with its result, instead of returning it. Such a rule may be acceptable (it’s still better than writing your own continuations), but it’s certainly worth exploring other possibilities.
For reference, here is the sample synchronous function all the examples will be representing:
get_document: (id) ->
document: db.get(id)
return document
Here is the same function when db.get
is asynchronous, using the currently-implemented defer
feature:
get_document: (id, callback) ->
document: defer db.get(id)
return callback(document)
The return
here is unnecessary, but included for completeness. If one were to return early from a function, you would have to write return callback(result)
to ensure that the result is returned and that the function ceases execution.
In converting the synchronous case to the asynchronous one, two things were required:
- a callback argument was added to the end of the argument list
- the callback was invoked at every return
site, as well as return
This will be the base against which we compare the alternatives. The aim is to have as few changes required, while maintaining flexibility and obviousness (since an async function must be called differently to a sync one, the programmer should know at a glance which is which).
proposal 1: infer everything
This was the initial approach. Basically, the programmer would not be required to add anything - merely use the defer
keyword. If a function contains defer
, then it becomes asynchronous. In order to support this, an extra (hidden) callback argument is added, and all return
statements call this callback explicitly. For example, the following:
get_document: (id) ->
document: defer db.get(id)
return document
Is almost identical to the sync code, with the addition of the defer
keyword. The code generated would be similar to the following in the current implementation:
get_document: (id, _a) ->
document: defer db.get(id)
return _a(document)
where _a
is a generated name known only to the CoffeeScript compiler.
advantages:
- you cannot possibly forget to return into your callback - the compiler manages it for you
- minimal changes are required to convert a function into asynchronous code - in fact, there is no change aside from the (necessary)
defer
statement - since the compiler knows which argument is the callback, it could insert a check at run-time to ensure that a callback was provided, and raise a useful error message while the stack is still active, so that you can see exactly what call failed to provide a callback. This lessens the severity of the first disadvantage below, as it will be caught the first time the code is run
disadvantages:
- it’s hard to tell at a glance how many arguments the function takes - the function signature lists only one, and the only way of telling is to scan for the
defer
keyword anywhere inside the function body - the programmer cannot specify which argument should be the callback
- the programmer cannot save the callback object for later use, since its name is known only to the compiler
- accepting multiple callback arguments (one for success, one for error) is not possible
proposal 2: designated callback
Instead of automatically adding a callback, how about annotating the argument list? That way a caller can easily see that a callback argument is required, but a programmer does not need to alter return
statements. For example, you could use the return
modifier like so:
get_document: (id, return callback) ->
document: defer db.get(id)
return document
I think this strikes a good balance - to convert from sync to async, one simply uses defer
(which is necessary), and adds a special argument to the function signature. The compiler is then responsible for converting all return
statements into calls to the callback
local variable.
advantages:
- minimal changes to convert a sync function (aside from the
defer
, only the function signature) - function signature accurately represents required arguments
- the programmer cannot forget to return into the given callback, as the compiler will enforce it
disadvantages:
- for the most part,
callback
will never actually be mentioned in the function body. That could be a little confusing - multiple callback parameters (error, ok) are still not supported
proposal 3: proposal two again, with an error callback
In order to support error and success callbacks, a modification could be made. Consider the following:
get_document: (id, throw error_callback, return callback) ->
document: defer db.get(id)
throw new Exception("things are looking bad") unless document.validate()
return document
this code would indicate that callback
is the callback that should be used for returning values, but that error_callback
should be used for exceptions raised within that function body. That is, it will act just like the following code:
get_document: (id, error_callback, callback) ->
document: defer db.get(id)
return error_callback(new Exception("things are looking bad")) unless document.validate()
return callback(document)
It’s important to note that this will only work for exceptions raised in the function body, as opposed to exceptions that bubble up from calls made in the body. Since this is a compiler-level transform, it has no knowledge of runtime exceptions - only exceptions that are thrown by the local syntax.
If this were to be used, the omission of a throw
callback in the argument list could cause exceptions to be returned into the return
callback. Although I’m doubtful that this is a good idea - probably exceptions should just be immediately thrown up to the browser in the case where no error callback is defined.
advantages:
- minimal changes to convert a sync function (aside from the
defer
, only the function signature) - function signature accurately represents required arguments
- the programmer cannot forget to return into the given callback, as the compiler will enforce it
- exceptions can actually be used (locally) if the optional
throw
callback argument is used - the
error_callback
can be (manually) provided to defer calls that themselves take error callbacks, in order to cascade errors
disadvantages:
- for the most part,
callback
will never actually be mentioned in the function body. That could be a little confusing
proposal 4: A less intrusive alternative: “return into”
My final idea is less intrusive, and simply ensures that all returns are made into callbacks. For example, return value into callback
could be synonymous with return callback(value)
. It’s not shorter, but it allows for the compiler to check that all returns go via a callback function (of some sort). Any function that used defer
and a return statement that didn’t have the into
modifier would fail to compile - alerting the programmer to the issue, but without assuming which callback should be used.
The example would then become:
get_document: (id, callback) ->
document: defer db.get(id)
return document into callback
which is not much better than the current situation, except that the compiler can now tell you when you forget to return into a callback.
advantages:
- least restrictive and most flexible approach
- protects against some programmer errors
disadvantages:
- there’s still a lot of typing to convert between a synchronous and asynchronous function
- the compiler isn’t helping as much as it could
- introduces a new keyword,
into
Conclusion
All things considered, I favour proposal 3. I think it affords sufficient flexibility while still being as helpful to the programmer as is possible. And it’s also the most clear - all changes required to mark a function as asynchronous are entirely on the function signature line, which is the best possible place to put such information.
I’m interested to hear the thoughts of others, as well as any proposals I haven’t thought of. Please, let me know in the comments, or add your voice to the CoffeeScript issue discussion