“You can have it all. You just can't have it all at once.”

— Oprah Winfrey

Prologue

Out of curiosity, I happened to sit in a course on programming languages, which offered me some interesting views especially in functional programming languages. Among all the topics covered, I find continuation-passing style the most intriguing. One of the reason is that I actually used this kind of style before without knowing, and now I finally get a more systematic or theoretical view of this concept. And now that many imperative languages, like C++, also support functional programming, this is actually not limited to classic functional programming languages only.

Many thanks to Prof. Olivier Danvy, whose lectures are truly inspiring.

You may need some basic knowledge of lambda calculus to continue. For that, you can refer to my other post Lambda Calculus: A Preliminary View.

Continuation-Passing Style

Monad

I do not want this to become a long detour, but I need one short stop at monads. If you want a beginner-friendly explanation, you can check out A Monad Guide for Beginners.

To put it simple, a monad is a value with context information. And with monads, we can perform sequential operations that carry contextual information.

Continuation

Continuation-passing style, or CPS, is a programming style where control flow is made explicit by passing a continuation function. This contrasts with direct style, where control flow is implicit in the order of expressions.

Let us start with a direct-style example:

function add(x, y) {
    return x + y;
}

x = 1;
y = 2;
print(add(x, y));

If we make control flow more explicit, we can write a monadic-style sequence with let ... in. Here, each let names an intermediate value, and the expression after in is the continuation, that is, what happens next.

let x = 1 in
let y = 2 in
let t3 = add(x, y) in
let t4 = print(t3) in
    t4

For assignment, this is straightforward. We bind a value and continue. For function application, instead of returning a value and letting the caller decide what to do, we pass that value directly to a continuation. In this way, control flow becomes explicit.

function addc(x, y, cc) {
    cc(x + y);
}

(x) =>
    (y) =>
        addc(x, y, (t3) => {
            printc(t3, (t4) => {
                return t4;
            });
        });

More Examples

CPS is especially common in functional programming, where function composition and explicit control are central. In non-functional languages, we still see CPS-like patterns in asynchronous APIs.

A common example is Promise chaining. Instead of writing direct-style await, we pass success and failure continuations.

fetch("/api/endpoint")
    .then((success) => {
        /* success continuation */
    })
    .catch((failure) => {
        /* failure continuation */
    });

So the core idea is simple, in addition to direct-style control flow, we can build control transfer explicitly via continuations.

CPS Transformation

In compilers for functional languages, CPS is an important intermediate representation because it gives a uniform way to manage control flow and function application. Developers, however, usually write source programs in a more direct or monadic style. Converting those programs into CPS is called CPS transformation.

Prof. Olivier Danvy spent a whole lecture on CPS transformation, in fact he mentioned this idea many times during the semester. And he is very patient answering my stupid questions after the class. This chapter serves as a self-reflection on the basic idea of CPS transformation.
He has many papers about CPS, and one closely related one would be A New One-Pass Transformation into Monadic Normal Form.

To avoid unnecessary complexity, here we only consider a simplified call-by-value $\lambda$-calculus:
$$
e ::= x \mid \lambda x. e \mid e \space @ \space e
$$
Here, $@$ denotes infix application. The target language is split into computations and values. A computation $c$ can be a value $v$ (as return), an application $v\ v$, or a named application in $\text{let…in}$.
$$
\begin{aligned}
c ::=\ & v \mid \text{let}\ x = v\ v\ \text{in}\ c \mid v\ v \newline
v ::=\ & x \mid \lambda x. c
\end{aligned}
$$

Two-Pass Transformation

Somehow I could not type some math symbols in Markdown, so the notation I used here is slightly different to that in the paper.

A naive strategy is to introduce context first, then simplify it. In the first pass, we transform source terms directly into a CPS-like form:
$$
\begin{aligned}
\mathcal{T}[x] & \rightarrow x \newline
\mathcal{T}[\lambda x. e] & \rightarrow \lambda x. \mathcal{T}[e] \newline
\mathcal{T}[e_0 \ @ \ e_1] & \rightarrow \text{let } w_0 = \mathcal{T}[e_0] \text{ in } \text{let } w_1 = \mathcal{T}[e_1] \text{ in } w_0 \ @ \ w_1
\end{aligned}
$$

Let’s then see an example.

$$
\begin{aligned}
\mathcal{T}[\lambda x. f(g(x))] & \rightarrow \mathcal{T}[\lambda x. f \ @ \ (g \ @ \ x)] \newline
& \rightarrow \lambda x., \mathcal{T}[f \ @ \ (g \ @ \ x)] \newline
& \rightarrow \lambda x., \text{let}\ w_0 = f\ \text{in}\ \text{let}\ w_1 = \mathcal{T}[g \ @ \ x]\ \text{in}\ w_0 \ @ \ w_1 \newline
& \rightarrow \lambda x., \text{let}\ w_0 = f\ \text{in}\ \text{let}\ w_1 = \newline
&\quad\left(\text{let}\ w_0’ = g\ \text{in}\ \text{let} \ w_1’ = x\ \text{in}\ w_0’ \ @ \ w_1’\right)\ \text{in}\ w_0 \ @ \ w_1
\end{aligned}
$$

It works, doesn’t it? But as you may noticed, the transformed CPS has redundant bindings such as $\text{let}\ w_0 = f\ \text{in …}\ w_0$, which can be simplified to just $f$. And as a result, we have to perform a second pass to simplify it into monadic normal form.

It works, doesn’t it? But as you may noticed, the transformed CPS code contains some redundant bindings such as $\text{let}\ w_0 = f\ \text{in …}\ w_0$, which can be simplified to directly using $f$. And as a result, we have to perform a second pass to simplify it into monadic normal form.

One-Pass Transformation

Why we have redundancy in the two-pass transformation? In other words, can we simplify $\text{let…in}$ right during transformation?

Let us take a closer look. The redundancy comes from $w_0$ and $w_1$. What makes them special is that they appear in tail position. Informally, tail position means the expression result is returned directly to the surrounding context, so no additional binding is needed.

Based on this observation, we can update the transformer to treat tail positions specially and obtain monadic normal form in one pass. We use $\mathcal{T}_t$ for tail positions, and $\mathcal{T}$ elsewhere.
$$
\begin{aligned}
& \mathcal{T}_t: e \rightarrow c \newline
\mathcal{T}_t[x] & \rightarrow x \newline
\mathcal{T}_t[\lambda x. e] & \rightarrow \lambda x. \mathcal{T}_t[e] \newline
\mathcal{T}_t[e_0 \ @ \ e_1] & \rightarrow \mathcal{T}[e_0] \ \lambda v_0. \mathcal{T}[e_1] \ \lambda v_1. v_0 \ @ \ v_1 \newline
\newline
& \mathcal{T}: e \rightarrow (v \rightarrow c) \rightarrow c \newline
\mathcal{T}[x] \ \kappa & \rightarrow \kappa \ @ \ x \newline
\mathcal{T}[\lambda x. e] \ \kappa & \rightarrow \kappa \ @ \ \lambda x. \mathcal{T}_t[e] \newline
\mathcal{T}[e_0 \ @ \ e_1] \ \kappa & \rightarrow \mathcal{T}[e_0] \ \lambda v_0. \mathcal{T}[e_1] \ \lambda v_1. \text{let } w = v_0 \ @ \ v_1 \ \text{in}\ \kappa \ @ \ w
\end{aligned}
$$
The key is to identify tail positions, which is $e$ in $\lambda x.e$. For non-tail positions, instead of introducing $\text{let…in}$ blindly, we pass an explicit continuation $\kappa$ that specifies what happens next. This is why $\mathcal{T}_t$ and $\mathcal{T}$ have different types, with $\kappa : v \rightarrow c$.

In the paper, Prof. Olivier Danvy refers to $\kappa$ as a functional accumulator instead of a continuation since it is not applied tail recursively. But I think it is easier for understanding.

Now let us see how the one-pass transformer produces monadic normal form directly. We use the same example and start with $\mathcal{T}_t$, since the whole term is in tail position.
$$
\begin{aligned}
\mathcal{T}_t[\lambda x. f(g(x))] & \rightarrow \mathcal{T}_t[\lambda x. f \ @ \ (g \ @ \ x)] \newline
& \rightarrow \lambda x., \mathcal{T}_t[f \ @ \ (g \ @ \ x)] \newline
& \rightarrow \mathcal{T}[f]\ (\lambda v_0. \mathcal{T}[g\ @\ x]\ \lambda v_1. v_0\ @\ v_1) \newline
& \rightarrow (\lambda v_0. \mathcal{T}[g\ @\ x]\ \lambda v_1. v_0\ @\ v_1)\ @\ f \newline
& \rightarrow \mathcal{T}[g\ @\ x]\ (\lambda v_1. f\ @\ v_1) \newline
& \rightarrow (\mathcal{T}[g]\ \lambda v_0’. \mathcal{T}[x]\ \lambda v_1’. \text{let}\ w=v_0’\ @\ v_1’\ \text{in}\ \kappa\ @\ w)\ (\lambda v_1. f\ @\ v_1) \newline
& \rightarrow (\mathcal{T}[x]\ \lambda v_1’. \text{let}\ w=g\ @\ v_1’\ \text{in}\ \kappa\ @\ w)\ (\lambda v_1. f\ @\ v_1) \newline
& \rightarrow (\text{let}\ w = g\ @\ x\ \text{in}\ \kappa \ @ \ w)\ (\lambda v_1. f\ @\ v_1) \newline
& \rightarrow \text{let}\ w = g\ @\ x\ \text{in}\ (\lambda v_1.f\ @\ v_1)\ @\ w \newline
& \rightarrow \text{let}\ w = g\ @\ x\ \text{in}\ f\ @\ w
\end{aligned}
$$
This result is much cleaner, redundant $\text{let…in}$ bindings disappear because we are aware of the syntactic structure of the source term and can treat tail positions specially.

According to the paper, we can refine the view further by distinguishing static and dynamic construction. Here, underlines are for statically determined constructs, and overlines are for constructs introduced during transformation.
$$
\begin{aligned}
& \mathcal{T}_t: e \rightarrow c \newline
\mathcal{T}_t[x] & \rightarrow \underline{x} \newline
\mathcal{T}_t[\lambda x. e] & \rightarrow \underline{\lambda} x. \mathcal{T}_t[e] \newline
\mathcal{T}_t[e_0 \ @ \ e_1] & \rightarrow \mathcal{T}[e_0] \ \overline{\lambda} v_0. \mathcal{T}[e_1] \ \overline{\lambda} v_1. v_0 \ \underline{@} \ v_1 \newline
\newline
& \mathcal{T}: e \rightarrow (v \rightarrow c) \rightarrow c \newline
\mathcal{T}[x] \ \kappa & \rightarrow \kappa \ \overline{@} \ x \newline
\mathcal{T}[\lambda x. e] \ \kappa & \rightarrow \kappa \ \overline{@} \ \underline{\lambda} x. \mathcal{T}_t[e] \newline
\mathcal{T}[e_0 \ @ \ e_1] \ \kappa & \rightarrow \mathcal{T}[e_0] \ \overline{\lambda} v_0. \mathcal{T}[e_1] \ \overline{\lambda} v_1. \underline{\text{let}}\ w = v_0 \ \underline{@} \ v_1 \ \underline{\text{in}}\ \kappa \ \overline{@} \ w
\end{aligned}
$$
So, $\underline{\lambda}$ means the lambda is already present in the source term, while $\overline{\lambda}$ means the lambda is introduced by the transformer. The same reading applies to $\underline{@}$ and $\overline{@}$, where the dynamic construction comes from the use of continuations $\kappa$.

Epilogue

So, this is it, a simple review on CPS transformer. And it kind of strikes me on how a simple idea works so well. Anyway, see you. ᓚᘏᗢ