*Be ahead of your time, but only a little.*

*– Mason Cooley*

Do you understand monads? I don’t, so I thought I’d explain them to you.

Then, once you’ve got it, I’ll re-explain Lucid. No Haskell, optional category theory, gluten free.

A few decades ago Ed Ashcroft and I invented this crazy-ass language called Lucid.

The idea was that a program was a set of equations describing a set of variables changing over time. The original idea was to make formal program verification easier by reducing programming to mathematics -programs are just sets of equations. However the project expanded quickly til we realized that we’d invented a functional data flow language

Except this supposedly functional language had a lot of strange features. For a start, it was purely first order (though this has since been partially addressed). Weirder still, variables denoted infinite sequences of data objects, and data operations like addition operated pointwise.

Even weirder, we discovered that there were two separate notions of function. The most general, that we simply called “functions” were arbitrary (but computable) transformations of streams. The more specialized, that we called “mappings”, transformed streams point by point, using rules that varied according to the time index. In the same way, there were two notions of subcomputation, one general and one point by point. We called the constructs “produce” and “compute” respectively.

(In terms of lambda calculus notation this corresponds to two lambdas and two “let” constructs. In the general versions variables are bound to arbitrary streams, in the specialized ones to individual data objects)

Wadler characterizes the difference between the two lambdas and the two lets as *call by value* versus *call by name.* I think this is accurate. We called the call by value functions (mappings) “synchronic” because the value of f(X) at time t depends only on t and the value of X at time t.

However at the time this distinction was considered “messy and weird” and we dropped the call by value *where* clause (the equivalent of let).

Little did we know that these strange phenomena were the result of using monads. How could we, since they hadn’t yet been introduced to computer science.

**My take on monads**

There are plenty of tutorials on monads, Phil Wadler’s *Essence of Functional Programming* being one of the best, but I don’t find any of them to be particularly accessible. One reason is they require familiarity with Haskell types and category theory, which I’d rather avoid. They other is that they assume the motivation is to bring back sequentiality and side effects to functional programming. No thanks. Ed and I invented Lucid to allow programming without sequentiality. If I want to do sequential programming, I’ll write in C.

So I worked out an approach to monads which avoids Haskell and (most) category theory. I also drop the assumption that we want to program imperative I/O with side effects.

My approach instead uses conventional mathematical concepts and notation and hopefully is more accessible

The basic idea of a monad is that for every data domain D we have a derived domain D* which includes D and in some sense generalizes the elements. One example to keep in mind is having D* be the collection of all *sets* of elements of D.

**The monad laws**

This brings us to the first component of a monad, namely an embedding (which I’ll also call *) from D to D*. This ensures that every individual data object in D can be ‘found’ in D*. For example, if d is 3 and we’re dealing with the set monad, d* is {3}. The existence of this map guarantees that in some sense, D is a subdomain of D*.

The second property is that * on domains is basically a closure operator: D** is the same as D*. Not literally: with the set monad, D** is the domain of sets of sets of elements of D, not the same as D*. What the second property guarantees is a collapse operator V that maps D** to D*. In the case of sets, it’s the union operator that turns a set of sets into the unions of its elements. Thus {{2,3},{4,5,6},{2,6,7}}V is {2,3,4,5,6,7}.

Note that e.g. 3** ({{3}}) collapses to 3* ({3}). This must always be the case.

Next, it’s necessary that collapse be associative. D*** can be seen in two ways: as (D**)* and (D*)**. The result is two ways to collapse it to D*, namely (D**)* to (D*)* to D** to D, and (D*)** to (D*)* to D** to D*. These have to give the same result.

For example, with sets an element of D*** is a set of sets of sets. It collapses (both ways) to the set of all elements ‘appearing’ anywhere.

The third property is that for any domains D and E and any function f from D to E, there is a function f* from D* to E*. We can think of f*(s) as map(f,s). These functions must compose;if f:D->E and g:E->F, then (f*|g*) = (f|g)*, (“|” being reverse composition – unix piping).

For the set monad, f* applies f elementwise to the elements of its argument. If s is {2,3,5} and f is the square function, then f*(s) = {4, 9, 25).

Furthermore * must respect the embeddings of D into D* and E into E*: f*(d*) =f(d)*.

Finally, * on functions has to play nice with collapse. If s’ is in D**, there is an alternate way to calculate f**(s’)V. That is to collapse s’ first to s, then take f*(s). This must yield the same result. In symbols, f**(s’)V = f*(s’V)

In terms of sets, f** takes a set of sets and applies f elementwise to the ‘lower’ sets. Collapsing the result is the same as collapsing the set of sets and applying f elementwise. This rule (warning: category theory!) says that collapse is a *natural transformation *from the functor ** to the functor *.

**The Lucid monad
**

This brings us to our favorite monad, the *stream *monad. A stream is an infinite sequence

*d _{0}, d_{1}, d_{2}, d_{3}, , …*

of data objects. In the stream monad D* is the set of all streams over D. Given d in D, d* is the constant stream

*d, d, d, d, …*

Given f from D to E, f* from D* to E* works pointwise: if s in D* is

*d _{0}, d_{1}, d_{2}, d_{3}, …*

then f*(s) is

*f(d _{0}), f(d_{1}), f(d_{2}), f(d_{3}), …*

Finally the collapse operator from D** to D* takes the diagonal; if s’ in D** is

*<a _{0}, a_{1}, a_{2}, a_{3}, …>, <b_{0}, b_{1}, b_{2}, b_{3}, …>, <c_{0}, c_{1}, c_{2}, c_{3}, …>, …*

its collapse s is

*a _{0}, b_{1}, c_{2}, d_{3}, …*

The monad laws are easily checked, the stream monad is no more complex then the set monad. In particular the collapses of D*** to D* both give the diagonal of a 3 dimensional cube.

The nonstandard interpretation of the λ calculus over the stream monad is exactly Lucid, complete with the controversial nested loop construct (which I’ll go into another time).

**Programming based on the set monad
**

The set monad, like the stream monad, determines a nonstandard interpretation of the lambda calculus. In this interpretation the elements of D* are not streams but *nondeterministic* elements of D. Every element of D* is a set of possibilities. Thus if s in D* is {1,3,5}, it’s a value which is either 1, 3, or 5.

Given an element d (say, 3) of D, d* is {3}, meaning a value which is certainly 3. If s is as above and f is the square function, f*(s) is the square of a possible value of s – {1,9,25}. In other words, to get a value of f*(s), you get a value of s and apply f to it.

If s’ is in D**, it is a nondeterministic nondeterministic element of D. Collapsing to D* means taking one of these nondeterministic elements then taking one of its values. This means s’ collapses to the union of its members.

It’s not hard to check the monad laws and to verify, for example, that the sum of two nondeterministic D values takes on all possible pairs of values of the summands.

So what is the analog of synchronic functions that work pointwise? These are functions that examine only one possibility at a time of their arguments. Thus a function that takes a value of its argument and returns one of its square roots qualifies. On the other hand a function that takes two values and returns one in between does not.

To make the language interesting we need additional primitives, like fby, that are not of the form f*. For sets we can define fby(s,t) as the union of s and t. This is McCarthy’s amb function.

**Programming based on the weighted set monad**

One variation on the set monad is to have every element of the set tagged with a number between 0 and 1. Then D* is the collection of all sets of tagged elements whose tags add up to 1. We can think of the elements of D* as random variables over D (it simpliifes things if we take only finite sets).

Given d, d* is the set whose only element is d, tagged with 1. Collapsing is union except the inner and outer tags are multiplied. The function f*(s) takes an element d of s and returns f(d) with the same probability. The sum of two operands is the set of pairwise sums with each sum tagged with the product of the tags of the sums. Also, we may have to consolidate: if different sums produce the same result, we add the separate tags.

The analogy to synchronic is similar to that of the set monad, with a one-sample rule. An example is the square root function,with 50/50 probabilty for each root.

For an analogy of fby we can define fby(s,t) as 50% s and 50% t.

**The sequence monad**

Another possible monad has D* be (finite) sequences of elements of D. We can think of these as iterators, which produce the sequence values in order upon repeated demands.

Elements of D** are sequences of sequences and are collapsed by concatenating the sequences in order. We can think of them as iterators that produce iterators. Collapsing means generating the iterators in turn then running them individually. The sum of two iterators produces all pairwise sums in lexicographic order.

The iterator d* produces d, just once.

f*(s) is the result of applying f to every element of sequence s. Operationally, it means running s and applying f to everything produced.

Synchronic means that the nth output of g(s) depends only on n and the nth output of s. A counter is synchronic.

We can define fby(s,t) to be the first element of s consed to t. And as an iterator it runs s once, emits the value, but thereafter becomes t.

**The time tagged monad**

In this monad an element of D* is an element of D tagged with a natural number, which we can think of as an arrival time (clock tick).

The value d* is d tagged with 0. Collapsing D** means turning double tagged elements of D into single tagged elements of D. We take the maximum of the two tags.

The value of f*(s) is the result of applying f to the data part of s and leaving the tag unchanged. We wait for s to arrive, then apply f without any further delay.

The sum of two tagged values is the sum of their values tagged with the maximum of the two tags. We wait for the operands to arrive, and as soon as the last one shows up, add them without further delay.

A call by value function operates on the data value of its argument and possibly increases the delay.

A non f* primitive is race(x,y), which returns the argument with the smallest tag – the first to arrive.

**Maybe**

This is a well known monad and features in most tutorials. D* is a copy of D with a single error object added. For example, N* is a possibly erroneous natural number. Maybe… a natural number.

Given any d in D, d* is the image of d in D*. If f maps D to E, f* maps elements of D in D* to the corresponding element of f(d) in E*, and maps *error* to *error.* The collapse on d** maps the image of D onto itself, and both errors onto the single error in D*. The sum of two elements of D* is their normal sum if both are in the image of D, otherwise *error.*

A call by value function must map *error* to *error.*

An example non f* function returns *true* if its argument is error, *false* otherwise.

**The IO monad.**

So what is the IO monad, the most famous of them all? For many people, the whole point of monads?

This is far from clear, and I’ve been trying for some time to pin it down. Wadler in “essence” defines an output monad in which an element of D* is an element of D together with a string. Apparently the idea is that the string is the output that was generated while producing the element of D.

The element d* is d together with the empty string. The function f* operates on the data component and carries over the string. The collapse from D** to D* concatenates the two strings associated with a value in D**. The sum of two elements of D* is the sum of their values together with the concatenation of the two output strings. A call by value function transforms the data and adds some output.

This is my best shot. What I don’t understand is where the side effect comes in. The other monads I’ve discussed don’t produce side effects. I’ve asked people who ought to know and got different answers. The most common explanation is that the Haskell people have booby-trapped part of the implementation and connected it directly to the output. Who knows?

I guess after all I’m just not comprehending monads …

In a future post I’ll expand on the Lucid approach to I/O, which uses neither side effects nor the IO monad.

Great post, thank you