Optimizing Affine Handlers

Abilities and their cost

Abilities in Unison are a convenient way to structure a lot of programs. They allow programming with a sort of domain-specific language, and decoupling the exact behavior of the domain-specific operations from the implementation. Also, they don't require rewriting functions in a different 'style' when switching between 'pure' code and domain-specific/effectful code.

However, a disadvantage they have is overhead. For an ability, say

►

ability Tell a where
  tell : a ->{Tell a} ()

when you call the tell operation in your code, it is an inherently dynamic event, because the exact behavior depends on which handler for the Tell ability is active at that point in the execution. Further, the handler looks something like

►

Tell.ignore : Request {g, Tell a} x -> x
Tell.ignore = cases
  { r }           -> r
  { tell _ -> k } -> handle k() with Tell.ignore

In the tell case, the handler gets access to the continuation of the tell request, and in general it can do whatever it wants with that continuation. This means that in general, we need to actually capture that continuation. This involves building a value with some stack contents, moving back along the stack as we do so. Then, the handler calls the continuation, which writes what we copied back.

In this example, it's pretty obvious that this is a lot of wasted work. The handler doesn't really do anything. So, the stack is getting copied only to be overwritten exactly. We are essentially implementing a plain function call by hand, using continuation operations.

Even worse, the cost of this is proportional to the size of the stack segment from where the tell is called to where the handler was installed. So, if Tell.ignore was the first handler you used, and you have several other handlers after it, as well as some non-tail function calls, this might be a very expensive no-op. It'd be nice if we could do better.

Mitigating the expense

There is literature on what sorts of ability handlers are able to avoid stack copying. With sufficient work, it's possible to optimize each individual effect case, so that only that case needs to be in a 'good' form to benefit. However, making that work in the most situations requires some additional tricky facilities in the runtime.

So, instead, I've been working on a more modest optimization. It revolves around recognizing what you might call "affine" handlers. To fall into this category, the handlers must have all branches in one of two forms

The continuation is not used at all.
The continuation is used exactly once, in the tail position of a (tail) handle block that reinstalls the same handler.

Tell.ignore above is an example of the second form. The key thing about these sorts of handlers is that they never need to capture continuations. In the first case, we just throw the continuation away. In the second case, we put it back right where it was, so we can implement it as a somewhat special function.

The only tricky part is that ability requests within the handler are supposed to refer to the dynamic context the handler occurs in. Normally this is accomplished by unwinding the stack while copying it, but we want to leave it in place. So, instead, we remember the context of the affine handler when we install it. As long as there are only affine handlers preceding it, there is no way to change the dynamic context of the handler, so just remembering the initial value works.

This does mean that we can only use the optimized version of an affine handler as long as only other affine handlers have been used so far. However, affine handlers are actually quite common. Every handler that is basically implementing some kind of state or exception effect is affine.

The main counterexamples are handlers that implement some kind of repetition or non-determinism. For instance, handlers for the Each ability would typically not be affine. However, I think it is common to mainly use Each handlers at the 'lowest' level, to repeat operations using some other effects, and in that case the outer affine handlers would be in a position to be optimized.

Details

The optimization of affine handlers has recently been merged to the unison interpreter implementation. It happens when loading code, so it should even be applied to code from previous versions that has been saved somewhere.

Initially it was pretty simple and limited, but over the past couple weeks it's been fleshed out quite a bit by making sure that most of the (pretty gnarly) handlers we use in the cloud computing implementation are properly recognized. So, for instance, you don't need to write precisely in the style of Tell.ignore. You can use the following sort of 'nice' handler structure

►

ignore2 : '{g, Tell a} x ->{g} x
ignore2 th =
  handle th()
  with cases
    { r }           -> r
    { tell _ -> k } -> ignore2 do k()

Also, if we define the following somewhat bizarre ability

►

ability Rec where
  rec : {Rec} ('{Rec} r -> r)

then the following handler is also recognized

►

recurse : '{Rec} r -> r
recurse th =
  handle th()
  with cases
    { r }        -> r
    { rec -> k } -> recurse do k recurse

Also, there can be a fair amount of complication in the structure. So, for instance, if we have the ability

►

ability Counter where
  inc : {Counter} Nat

the following handler is affine

►

complicated : Nat -> '{Counter} r -> r
complicated n th =
  use Nat +
  m = n + 1
  handle th()
  with cases
    { inc -> k } ->
      match m with
        2 -> complicated m do k (m + 35)
        _ -> complicated m do k m
    { r }        -> r

The main blocker to the optimization is resuming the continuation in a separate function, which can sometimes be deceptive based on the surface syntax. The example I came across in auditing our cloud code looks like the following

►

badPrint : '{Tell Text} r ->{IO} r
badPrint th =
  handle th()
  with cases
    { r }           -> r
    { tell t -> k } ->
      handle printLine t
      with cases
        { Exception.raise _ -> _ } -> badPrint do k()
        { _ }                      -> badPrint do k()

It's true that k is called here exactly once in a tail position. However, this only happens indirectly in the Exception handler for the printLine call. This actually involves some indirection internally, so it cannot be recognized. Instead, to get the optimization to apply, the calls to k should be factored out of the exception handler, so that they occur syntactically within the same handler, like so

►

goodPrint : '{Tell Text} r ->{IO} r
goodPrint th =
  handle th()
  with cases
    { r }           -> r
    { tell t -> k } ->
      handle printLine t
      with cases
        { Exception.raise _ -> _ } -> ()
        { _ }                      -> ()
      goodPrint do k()

It's fine to have the exception handler return a value, and match on that value to determine how to call k. It is mainly the additional handler that introduces too much indirection.

Also, mutually-recursive handlers will just not be optimized right now. In the future we may be able to support them, but for the time being, only self-recursive handlers are possible to optimize.

This optimization is now in the 0.5.42 release of unison. Hopefully everyone is able to get a nice performance boost when they upgrade.