I woke up in my alien-bin after a good night’s hypno. On my way down to breakfast I noticed the sign in the elevator: maximum capacity six atoms. Greek elevators are really small.

I loaded my plaque with bacon and eggs. I think having a good breakfast is very semantic.

As an episkeptic [visitor] from Canada I appreciated the mild weather. Yesterday was a bit cryo, but today the weather was zesty – no need for a coat. They say tomorrow will be really thermos.

My collaborator Good-Looking [Aristophanes] picked me up in his autokinetic at nine the hour. We dropped some clothes at the clean-ateria and went by the ethnic bench [national bank] to get some cash. Then we went straight to the all-knowledge [university] where he slaves. He had to didact a course so I worked on my article. I don’t slave as hard as he does and I evened warned him he writes so many articles he’ll get arthritis [arthros=article]. Ha ha.

Ari/nes said he’d be a few leptos late because he had to pass by the workateria [lab]. Then we threw an eye at some ideas for liturgical systems and programming glosses [languages]. Time for lunch? Let’s fuge [leave]. Just a couple of deuteroleptos I said, finishing my article.

What would you like, asked the servitor. I’d had bacon for breakfast so I didn’t fancy more creatures [kreata=meat]. A nice country salad would be fine. For dessert we split a bougatsa as big as my pode. It was good but poly glucose.

After we phaged [ate] my friend had to didact again – C++. I’m glad I don’t have to math [learn] that gloss. Meanwhile I went to freshen up in the anthropological washroom. I noticed one of the stalls in the gynaecological washroom was out of liturgy – glad I’m an andron. Someone put up a sign saying the aitia [cause] was apolitical [uncivilized] people throwing paper in the toilet. The sign said they should gyre to the spielia [caves].

My friend is a family man and had to head off – he has a nymph [wife], two paedia [kids] and a moron [baby] at home. He dropped me off at the alien-bin and I decided to see a movie.

What kind of movie? Astro Polemics: the Dynamo Ex-hypnos [Star Wars: the Force Awakens] was playing but as I phobed it was sold out. Anathema! [Darn!] Luckily there were other epilogs [choices]. Normally I like epistemic fantasy but espionage movies are fun too. Will the catascope [spy] be caught? Or police thrillers. See the epitheorist [inspector] chase the criminal. Not to mention war movies. Follow the career of the young axiomatic [officer] who becomes a strategist on a white hippo. Or biography – the would-be pop star with the powerful pneumons who finally has a hit tragedy [song] and nike’s a grammy.

In the end I decided I’d rather see a Gouestern, so I agora’d a ticket to the Hateful Okto. A lot of haema spilled! And necro’s everywhere! Freaky!

Later my friend and his family joined me for dinner. It was good and the logarithm was reasonable – we didn’t spend a lot of cabbage.

As we pulled out of the restaurant’s idiotic [private] parking lot I realized that soon I would have to take the ironroad to the aerodrome. And hear the traditional Greek farewell greeting – “call a taxi” [kalo taxidi].

]]>Whenever you want to do something, there’s always someone who says there’s something else you have to do first.

Call the thing you want to do A, the thing you’ve been told you have to do first B. They’re telling you you have to do B before A.

I remember when I was a precocious kid (grade 8 I think) I wanted to find out what relativity was about. I got a book from the Penticton public library but no luck: it started with frames of reference with a lot of algebra. (In math class we were doing percentage). I taught myself algebra but it still didn’t make much sense. Worse, for general relativity (A) you needed tensors (B) for which you needed calculus (B’) for which you need algebra (B”). It seemed it would take years before I was ‘ready’ to understand relativity and so I gave up.

(In fact it is possible to explain relativity without mathematical machinery but I’ll leave this till another time.)

That was not an accident; our whole educational system is based on B before A. Here’s a typical scenario: start of September, a new PhD student comes to see me, eager to start research. But … there’s these grad courses she has to take, and has to do well in. Best not to start any projects. So I end up saying the equivalent of “welcome to the program, see you in June”. Depressing.

Suppose she’s really keen on data mining. There’s even a course on data mining. But she can’t take it – prerequisites. She has to take, say, the advanced database course first.

And this is someone who already has an undergraduate degree. For that matter the whole undergraduate program is one big B. So is school. For that matter the whole educational system is a giant B that you have to do before A – before becoming an active and productive member of society.

My dad had a B-before-A experience that became a standing family joke. This was a long time ago but we still laugh about it. Pierre Burton, a very popular Canadian author at the time, had just written a book – I believe it was *The Comfortable Pew*, a mild critique of the establishment churches. My dad wanted to take a look at it so he dropped by the (same) Penticton public library and asked for it.

The librarian was indignant. “You don’t just walk in off the street” she said, “and ask for a book like that”.

Hers was the voice of B before A. You don’t just pick up a book on relativity and understand it. You don’t just step into my office and start doing research. You don’t just walk in off the corridor and take data mining. You don’t just turn 13 and expect to contribute to society.

My daughter had a B-before-A experience that is still painful to recall.

When she was still little (maybe 8) she decided she wanted to play the drums, and we agreed we should encourage her. We asked around and some people said, hold on, first she should learn music theory and maybe a bit of piano. Hmm. You don’t just pick up some drumsticks and start banging away …

Luckily, I asked a colleague who is also a professional drummer and he said, let her start with drums, learning theory as needed. Sounded good.

In fact my colleague’s advice was a special case of an alternative to the B-before-A anti-pattern. That is, first you do a bit of A until you need some B. You do just enough B, then more A, till you need more B, and so on. Call it the a-b-a-b pattern.

So I took my daughter down to the music store and luckily for us there was an actual child size drum kit on display. Furthermore you could lease it (after all who knows how long the passion for drumming would last). I was ready to sign up when I remembered we were about to take a trip to California. “Sweetheart” I said, “when we get back from our trip (B) we’ll get the drums (A)”. Big frown.

You know what happened. When we got back, the kit was gone. As we were walking back to the car, my little girl burst into tears.

Father of the year!

]]>

I seem to be always discovering fundamental Laws of the Universe, especially about teaching. I’d like to share some of them with you. They are each called “Wadge’s Law” … by me. Maybe the name will catch on. Here they are.

**Wadge’s Law (of traffic)**

No matter how late you go through an orange light, the guy/gal behind you follows you through.

As far as I know this is not just a heuristic, it’s 100% true all the time. I’ve never witnessed an exception, and don’t know anyone who has.

**Wadge’s Law (of Vampire Courses)**

Whenever you’re giving a course, there’s always another course with a much heavier workload that is draining the life out of your students.

The students think that the Vampire Course, because it involves so much work, is more important, and its A’s and B’s are worth more. When in the clutches of the VC, they skip class or show up late, miss assignments, arrive pale and exhausted, fall asleep, drool, etc

**Wadge’s Law (of Meetings).**

*Before every formal meeting there’s a smaller, more exclusive, less formal meeting where all the important decisions are made.*

This is based on decades of experience in academia and friends’ experience in industry and government. Sometimes there’s an even smaller, more exclusive, less formal pre pre meeting where all the decisions of the pre meeting are made. Maybe even a pre pre pre meeting … until you reach some guy deciding everything in the shower.

**Wadge’s Law (of teaching)**

Just when you think you’re captivating the students, entrancing them, enthralling them, you turn around and see at least one of them half unconscious from boredom.

Head askew, eyes glazed and half shut, body starting to slump. And if you thought you were really soaring, there will be drooling.

**Wadge’s Law (of citation)**

No matter how unusual your last name, no matter how impressive your publication list, when you look up your citations there’s someone with the same name who totally outperforms you.

For me it’s a geologist named Geoff Wadge (absolutely true). He writes about volcanoes. A lot. More than I do about anything.

I have a friend with an unusual name, let’s call him Sam Wassereimer. I told Sam about this law and he says “I’m a counterexample”. Sam has published a few books and on Amazon he has no equal among the Wassereimers. I got suspicious, went to Google Scholar, and sure enough there’s a WD Wassereimer who totally outperforms Sam (a lot of journal publications). Says Sam, “but WD is dead!” Says I, “even dead he outperforms you!”

A break from Laws about academia.

**Wadge’s Law (of departure)**

From the time you get up and start leaving the house, till the time you actually drive away, an inexplicable five minutes has passed.

More generally, a five minute surcharge applies in other scenarios. Suppose you live in an apartment and it takes seven minutes to take the elevator down to the garage and exit it. If you leave at 1:30, you will drive away at 1:42.

**Wadge’s Law (of invigilation)**

No matter how easy the exam, and how much time the students have, there’s always at least one who stays till the bitter end.

Again, no known exceptions. I’ve seen the bitter enders play with their pencils to kill time.

My daughter, who also teaches, pointed out a companion Law. Since she uses Wadge as her professional name, this is

**Wadge’s Second Law of Invigilation**

No matter how hard the exam, and how little time the students have, there’s always at least one who leaves ridiculously early.

The next one is derives from a long career teaching Computer Science:

**Wadge’s Law (of computer courses)**

Every Computer Science Course has a tendency to degenerate into a programming course.

For example, in an AI course the students write a chess player, in an embedded systems course they write realtime software, in a hardware course they write simulators … and so on.

This rule is not absolute; the instructor can resist the tendency. But it takes effort.

This rule is so profound I’ll devote a future posting to it. For example, what does a Physics course degenerate into? A psychology course? For that matter, a programming course?Think about it.

Finally, the most profoundest Law I’ve discovered so far.

**Wadge’s Law (of B before A)**

Whenever you want to do something, there’s always someone who says there’s something else you have to do first.

You’ll have to wait for yet another post to learn all many instances and examples of this Law. (Actually, I just gave you one).

Well I personally think this post has been excellent and can’t wait for feedback.

Oops!

]]>A formal power series is a (usually) infinite polynomial in x. For example

*1 + x + x*^{2}* + x*^{3}* + x*^{4}* + …*

This is an expression, not a number. If we give a value to x, we may get a number, or the evaluation may run away (diverge) on us (if |x|≥1).

They’re called *formal* power series because we don’t normally try to evaluate them, we just manipulate them formally and symbolically.

For example, if we multiply the above series by *1-x*, we get … *1*. So *1-x* is the multiplicative inverse of that series, even though the series diverges for most values of *x*. These are purely formal manipulations.

So where’s the fun? Well recently my colleagues and I produced a Lucid interpreter. It’s written in Python and handles the full language described in the Lucid book, plus it supports an extra space dimension. I’ll tell you about it in another post and make it available through github.

Anyway pyLucid plus formal power series spells fun. Clearly a formal power series is determined by the infinite sequence of its coefficients, and such a sequence can be represented in a straightforward way as a space vector. Thus the power series 1 is *1 sby 0*, the series *x* is *0 sby 1 sby 0*, the series *x ^{2}* is

More precisely 1 sby 0 is the vector

1, 0, 0, 0, …

which represents

1 + 0x + 0x^2+ 0x^3 + …

0 sby 1 sby 0 is the vector

0, 1, 0, 0, 0, …

which represents the series

0 + 1x + 0x^2+ 0x^3 + …

0 sby 0 sby 1 sby 0 represents

0 + 0x + 1x^2+ 0x^3 + …

and 1 is the vector

1, 1, 1, 1, …

which represents the original series above.

What about operations on power series? Addition is just that: if pyLucid vectors P and Q represent two power series, P+Q represents their sum, since addition is coefficientwise.Multiplication is more complex.

(I’m going using ASCII notation instead of proper sub and super scripting. I tried but the cockamamy new wordpress editor ate my html).

Anyway we can break down P to p0+xP’ where P’ is p1+p2x+p3x^2+… . Similarly Q is q0+xQ’ and multiplying them we get

p*0q0 + (p0q1+p1q0)x + P’Q’x^2*

This defines multiplication of P and Q in terms of multiplication of P’ and Q’. This is recursion but P’ and Q’ are in no sense simpler than P and Q. However we can use it to write pyLucid code because it produces two coefficients before it recurses. It doesn’t deadlock.

The pyLucid definition of product of two series represented as space vectors is

*pprod(p,q) = r where p0 = init p; q0 = init q; pp = succ p; qp = succ q; r0 = p0*q0; r = (r0 sby (p0*qp + q0*pp)) + (0 sby 0 sby pprod(pp,qp)); end*;

(Dangnabbit, there doesn’t seem to be a way to indent a block in Gutenberg, WordPress’ newfangled editor.)

We can test *pprod* by multiplying 1 (our original power series) by itself, and we get

*1 + 2x + 3x^2 + 4x^3 + …*

which is correct. In other words, *pprod(1,1)* is

*1, 2, 3, 4, …*

Division is even trickier, I’ll spare you the explanation, the code is

*pdiv(q,w) = t where q0 = init q; w0 = init w; r = q0/w0; v = succ(q – r*w); t = r sby pdiv(v,w); end;*

We can test it by setting *one = 1 sby 0* and calculating *pdiv(one,1)* which gives coefficients

1, -1, 0, 0, 0, …

which is correct since the result is *1-x.*

Formal power series can be integrated and differentiated (formally) and that’s where things get interesting. The derivative of

*p0 + p1x + p2x^2 + p3x^3 + …*

is

*p1 + 2p2x + 3p3x^2 + …*

and this is easy to code:

*pderiv(g) = h where gp = succ g; k = 1 sby k+1; h = gp*k; end;*

Integration is even simpler, the integral of

p*0 + p1x + p2x^2 + p3x^3 + …*

is

*c + p0x + p1x^2/2 + p2x^3/3 + …*

(c is the constant of integration.)

The code is

*pinteg(c,s) = d where i = 1 sby i+1; d = c sby s/i; end;*

Simple enough … but now let’s think about the power series corresponding to the exponential function e^x. This function is its own derivative and therefore also its own integral. More precisely, e^x is one plus the integral of e^x. In code, we get

*ex = pinteg(1,ex)*

and this works! Even though it’s recursive, we get all the coefficients:

1.00000 1.00000 0.50000 0.16667 0.04167 0.00833 0.00139 0.00020 …

Let’s compute e! By evaluating ex at x=1. Evaluating doesn’t always work but sometimes we can get approximations. The code

*peval(p,a) = v where v = init p +(0 sby a * peval(succ p, a)); end;*

gives us the sequence of partial sums, which sometimes converges. Sure enough, *peval(ex,1)* yields

1.00000 2.00000 2.50000 2.66667 2.70833 2.71667 2.71806 2.71825 2.71828 …

In the same way, sin(x) and cos(x) are each others derivatives/integrals and the pyLucid code

*sinx = pinteg(0,cosx)cosx = pinteg(1,sinx)*

works. For example, the coefficients of *sinx* are

0.00000 1.00000 0.00000 -0.16667 0.00000 0.00833 0.00000 -0.00020 …

Now for the fireworks! Arctan(x) is really interesting and its derivative is 1/(1+x^2). We can integrate this in code:

*x2 = 0 sby 0 sby 1 sby 0; one = 1 sby 0; atanx = pinteg(0,pdiv(one,one+x2));*

This does the job, giving the coefficients of arctan(x) as

0.00000 1.00000 0.00000 -0.33333 0.00000 0.20000 0.00000 -0.14286 …

Now it so happens that the arctan of 1/2 plus the arctan of 1/3 is pi/4. So the expression

*4*(peval(atanx,1/2)+peval(atanx,1/3))*

is what we want, and sure enough we get

0.00000 3.33333 3.33333 3.11728 3.11728 3.14558 3.14558 3.14085 3.14085 3.14174 3.14174 3.14156 3.14156 3.14160 3.14160 3.14159 …

The amazing thing about all this is that pi emerges from a relatively simple program (given in full below) in which only small integers appear.

Enough fun for now. I’ll leave you with the complete program and a promise to tell you more about the interpreter and to make it publicly available.

4*(peval(atanx,1/2)+peval(atanx,1/3))

where

x2 = 0 sby 0 sby 1 sby 0;

atanx = pinteg(0,pdiv(one,one+x2));

one = 1 sby 0;

peval(p,a) = v where v = init p +(0 sby a * peval(succ p, a)); end; pdiv(q,w) = t where q0 = init q; w0 = init w; r = q0/w0; v = succ(q - r*w); t = r sby pdiv(v,w); end; pinteg(c,s) = d where i = 1 sby i+1; d = c sby s/i; end; columns = 16; rows = 1; numformat = '%7.5f';

end

(Sorry this is the best I can do with the persnickety editor which even screws up a simple paste. Land o’ Goshen)

In this model there is a first or initial time point, and every time point has a unique successor. Imperative iterations normally terminate, so we should have only finitely many time points. Lucid avoids the complications of finite time domains by making everything at least notionally infinite, so that the domain of time points is the natural numbers with the usual order.

In temporal logic logicians have studied a huge variety of time domains. What do they mean in terms of iteration and how do we write iterative programs over nonstandard time domains?

One simple generalization is to drop the requirement that there be an initial time point and specify instead that every time point also has a unique predecessor. This gives us the integers as the time domain. We won’t get into philosophical problems about infinite negative time and iterations that have formally been going on forever.

We can adapt Lucid to this notion of time by making our streams have the integers as their domain. For primitives we can retain the usual **first**, **next**, and **fby***. *It’s obvious how the first two work. And that **next** has a dual **prev.** But **fby** needs some thought. We soon see that *X ***fby*** Y *must be the standard part of *Y* shifted right preceded by the 0 point of *X * preceded by the nonstandard part of *X*. In other words

*… x _{-2}, x_{-1}, x_{0}, y_{0}, y_{1}, y_{2}, …*

with *x _{0}* at the zero point.

After playing around a bit we discover we need another operator that works like **fby** except that it puts *y _{0}* at the zero point. Since it’s sort of a backwards

*1, 1, 2, 3, 5, 8, 11, …*

Then

*1* **ybf*** (0*** ybf prev prev*** Fib + ***prev*** Fib)*

This definition goes two steps into the past to define the current value of *Fib* in terms of the two previous values. We can also go only one step by writing

*Fib = 0 ***ybf*** (1 ***fby prev ***Fib + Fib)*

and here we’re defining the next value of *Fib* in terms of the current and previous values.

Either of these is preferable to the confusing

*Fib = 1* **fby*** 1* **fby*** Fib +* **next*** Fib*

which is correct but hard to grasp because it defines **next next** *Fib* in terms of **next** *Fib* and the current *Fib.*

What would this mean for imperative programming? What would a **while** loop look like that has pre-initialization? Let alone one that has already been going on, forever? No idea.

What if we want to define a stream by two recurrence relations, one forward and one reverse? The simplest example is a counter, given in standard Lucid as

*I = 0* **fby*** I+1*

This clearly gives the wrong answer in the new interpretation; I is

*… 0, 0, 0, 0, 1, 2, 3, 4, …*

So let’s define the left hand part by

*J = J-1* **ybf*** 0*

and this defines *J* as

*… -3, -2, -1, 0, 0, 0, 0, …*

then we can put them together as *J* **fby*** I* giving

*… -3, -2, -1, 0, 1, 2, 3, …*

Interestingly, *J* **ybf*** I* gives the same result.

But this is all pretty clumsy. Can we do better? Yes, it turns out

*K = K-1* **ybf*** (0* **fby*** K+1)*

does the job. What if we parenthesize it the other way? What if

*K = (K-1* **ybf*** 0)* **fby*** K+1*

It turns out we get the same result. This is not a coincidence. There is a general rule that

*A* **ybf*** (M* **fby*** B) = (A* **ybf*** M)* **f****b****y** B

for any *A, B,* and *M*. Both expressions denote the generalized stream

*… a _{-3}, a_{-2}, a_{-1}, a_{0}, m_{0}, b_{0}, b_{1}, b_{2}, b_{3}, …*

This is a very pleasing identity (trivial to verify) and justifies us omitting parentheses in definitions such as

*K = K-1* **ybf*** 0* **fby*** K+1*

which combines two recurrence relations, one backwards, one forward.

Now let’s do an example involving both space and time. Obviously we’ll allow negative space coordinates. Instead of thinking up new names for the operators we’ll simply add .s to the time operators, e.g. **prev.s**.

The example is a simple minded numerical analysis treatment of heat flow. We have an infinite (in both directions) iron bar at temperature 1 in the middle tapering off linearly to 0. First we define a distribution that goes negative then min it with 0 to get the desired initial heat distribution Q:

*P = P-0.01* **ybf.s*** 1* **fby.s*** P-0.01*

*Q =***min***(P,0)*

Now we define an iteration in which (I know this is simple minded) H starts with Q and at each step each value of H is replaced by the average of the neighboring values.

*H = Q* **fby (prev.s*** H +* **next.s*** H)/2*

Imagine the headache it would be to do this with only nonnegative indices. We’d have to shift it all over, calculate how much shift, etc.

What about conventional imperative languages – do they have arrays with negative indices? Apparently not … I don’t know of any and neither do knowledgeable friends. Of course in Python (for example) you can write *A*[-3] but this is just for counting from the end of the array. Odd that.

Next post: branching time.

]]>

*2, 3, 5, 7, 9, eod, eod, eod, …*

The input and output conventions are adjusted to interpret *eod* as termination. If the above stream is the output, the implementation will ‘print’ the first five values and terminate normally. If a user inputs the first five values, then terminates the input stream, this is not treated as an error. Instead, the ‘missing’ values are evaluated to *eod* if requested.

What makes it interesting is that when a (strict) data operation is evaluated, if any (or all) of the operands are *eod*, the result is *eod.* (Non strict operations like *if-then-else-fi* need special rules). Thus termination propagates through expressions, which is almost always what you want. A continuously running filter which computes, say, a running average of its input will terminate normally if its input is terminated. There is no need to repeatedly test for end of input.

Furthermore, *eod* allows us to write expressions and filters for problems that require constructs like *while* or *for*.

A simple example is the *last* filter. Suppose we define S

*S = 0 fby S+I*

to be a running sum of the stream I. Let’s say I is finite and we want the sum of its elements. Obviously, this is the last value of S; so we write

*Sum = last S*

And how is last defined? Easy

*last X = X asa iseod next X*

Here *iseod* is a special operator that can examine *eod* without being turned to *eod*. It returns true if its argument is *eod*, false otherwise.

I was browsing Hacker News the other day and learned about the “rainfall problem”, a coding exercise used as a solve-at-the-whiteboard interview question. You have a series (finite, of course) of numbers and you must calculate the average of the positive numbers that appear before the sentinel value -999. Let’s solve it in Lucid with *eod*.

The first step is to remove the ad hoc sentinel value and replace is with *eod*. Let *R* be the original data stream; we define *T*, the finite stream of temperatures, as

*T = R until R eq -999*

Here *X until* P is like the stream *X* except that once *P is* true, the output is *eod*. The operator *until* (which normally would be built in) has a simple definition:

*X until P = if sofar not P then X else eod fi*

where *sofar Q* is true at a timepoint iff *Q* has been true up to then. We can define *sofar* as

*sofar Q = R where R = true fby R and Q end*

Now that we have the temperatures as a proper finite stream we can define the stream *P* of the positive temperatures as

*P = T whenever T>0*

(For this to work our implementation of *whenever* must handle *eod* correctly. This will be the case, for example, if we base it on the recursive definition

*X whenever P = if first P then first X fby (next X whenever next P)*

* else next X whenever next P fi*

which gives sensible results if *X* and/or *P *are finite.)

Finally we define the stream *A* of averages as

*A = S/N where S = first P fby S+next P; N = 1 fby N+1 end*

and the number we want is the last one

*answer = last A fby eod*

Note that if we count *until*, *sofar*, and *last* as being built-in, we don’t use *iseod*.

What happens when there is more than the time dimension? What do we do? If there is also a space dimension then we can add another special value, *eos* (end of space). The value *eos* propagates like *eod* when combined with ordinary data. And we add an extra rule: when *eos* combines with *eod* the result is *eod*; *eod* trumps *eos*. With this arrangement we have a simple output convention. If *X* is the 2D stream being output, we evaluate *X* at timepoint 0 and at successive spacepoints till we encounter *eos*. Then we move to the next line, increase the timepoint to 1, and output successive spacepoints till we again encounter *eos*. We then increase the timepoint to 2, output successive spacepoints etc.

If at any stage we encounter *eod*, we terminate normally. We could call this the ‘typewriter’ output convention. There is a corresponding input convention that requires an end-of-line input as well as an end-of-data input.

And what about three dimensions? For example, video in which frames vary in the time dimension and a frame varies in a horizontal (*h*) dimension and a vertical (*v*). We can generalize the typewriter convention using *eoh* (end of horizontal), *eof* (end of frame), and *eod*.

What’s the general situation, when there’s lots of dimensions? W. Du suggested a family of special objects, indexed by a *set* of dimensions. If *eod(S1)* and *eod(S2) *are the objects corresponding to the sets *S1* and *S2* of dimensions, then the result of combining them (say, adding them) is *eod(S1∪S2)*. Thus the bigger the index set, the more overpowering is the object. The value *eos* is revealed to be *eod({s})* and what we call simply eod is *eod({s,t})*. In the video context, *eoh* is *eod({h})*, *eof* is *eod({h,v})* and *eod* is *eod({h,v,t})*.

EOD

]]>

“But wait!” I hear you say. “It’s nice to write elegant equations defining the infinite array of all primes but what about the everyday working world? What about the array of midterm marks of my class? It’s finite! How do you work with that in Lucid2D?”

Not obvious. You can create an infinite array whose first elements are the midterm marks followed by infinitely many -99 values. Then set a variable N equal to the number of valid marks and (say) average the first N values of the array. But in what form to you read it in in the first place? We need a general input protocol that, obviously, doesn’t give -99 a special status.

This becomes more pressing if we’re inputting a finite number of finite arrays. In which case we need a whole stream of N’s.

As I mentioned when we first tried to “add arrays” to Lucid we tried to find a simple algebra of finite vectors, matrices, etc but never succeeded. The algebra would be specially complex if you want ‘ragged’ arrays in which rows have different lengths (as mentioned above). It turned out that infinite arrays are mathematically simpler than finite ones.

Still, a language that can’t easily average a finite list of midterm grades is not much use. Eventually, we came up with a better solution.

The idea is to introduce a special value that works like -99 above but isn’t an actual number. The new object is *eod,* which stands for “end of data”. The object *eod* is not a number, (or a string or a boolean etc). It’s the value of a stream that has terminated. I think of it as the value read in (in UNIX) after you hit control-D.

The input convention is clear. A finite stream gets entered as an infinite one padded with *eod’s*. The output convention is also simple. When outputting a stream X, you (as usual) demand the values of X in order and output them. When the value returned is *eod*, you don’t produce anything, rather you terminate normally.

Lucid is based on an algebra: a set of values together with a set of operations on these values. If you extend the set of values, you have to extend the operations and say how they work if any of the operands are the new values.

Fortunately that’s simple for *eod*. The basic rule is that any data operation (like “+”) that is strict (needs all its arguments) returns *eod* if *any* of its arguments are *eod*. In other words

*eod+1 = 4+eod = eod+eod = eod*

For an operation like *if-then-els*e that doesn’t need all its arguments, the first argument needed is sensitive on its own to *eod*:

*if eod then 3 else 4 fi = eod*

Otherwise it simply chooses between alternatives as usual:

*if true then 3 else eod fi = 3*

*if false then 3 else eod fi = eod*

*if false then eod else eod fi = eod*

Similarly

*false and eod = false*

*true and eod = eod*

*eod and eod = eod*

As for the space and time Lucid operations, they aren’t affected. For example, the value of *next X* at time *t* is still the value of *X* at time* t+1* – whether or not this value is *eod*. So if we’ve written an eductive interpreter, the only part that needs changing is the part that evaluates data operations.

For strict operations, the rule (above) is simple. For* if-then-else-fi* we first evaluate the test; if it is *eod*, we return *eod*. Otherwise we evaluate and return the chosen alternative. Similarly, to evaluate an *and* expression we evaluate the first operand and return *eod* if that value is *eod*. Otherwise, we evaluate the second operand and return this value.

To conclude let’s consider the midterm grades problem. Suppose that we input the stream (a stream, rather than an array) of grades as G, padded with *eod’s* as above. Let’s say G is

*56, 79, 66, 85, eod, eod, eod, …*

(it’s a small class).

First we define a running sum

*S = first G fby S + next G*

so that S is

*56, 135, 201, 286, eod, eod, eod, …*

Notice that S is also ‘really’ a finite stream of the same length. Yet its definition doesn’t take finiteness into account. When the next value of S is computed by increasing it by the next value of G, and this next value is *eod*, the resulting value of S is also *eod*.

Now we define a counter and form the running average A:

*N = 1 fby N+1*

*A = S/N*

so that A is

*56, 67.5, 67, 71.5, eod, eod, eod, …*

(because *eod/5* etc is *eod*).

All we need is the last value of A; the value when the next value is *eod*. At the moment we can’t do this because we can’t test for *eod*. The comparison *X eq eod* returns *eod*, because *eq* is a strict data operation.

We need a primitive that can gaze on *eod* without being turned to *eod*. We call it *iseod* and *iseod X* returns *true* if X is *eod*, *false* otherwise. Then we can define *last* as

*last(X) = X asa iseod next X*

and what we want is *last(A)*. However that’s all we want, so if we ask for

*last(A) fby eod*

we get one number and a normal termination, as desired. More eloquently, if

*just(Y) = first Y fby eod*

our output is* just(last(A))*.

There’s more interesting operators like *last* that can be defined with *eod* and *iseod*. Also, what about intensions that are finite in space as well as time? I’ll talk about these next time.

Finally, you know how this post has to end.

*eod*

]]>

The late Ed Ashcroft and I discovered this possibility when we tried to “add arrays” to Lucid. Initially, we intended Lucid to be a fairly conventional, general purpose language. So we considered various ‘features’ and tried to realize them in terms of expressions and equations.

Static structures like strings and lists were no problem. We ran into trouble with arrays, however. We tried to design an algebra of *finite* multidimensional arrays (along the lines of APL) but the results were complex and messy to reason about.

Finally it dawned on us that we should consider infinite arrays – sort of frozen streams. And that these could be realized by introducing (in the simplest case) a space parameter *s* that works like the time parameter *t*. In other words, Lucid objects would be functions of the two arguments *t* and *s*, not just *s*. These things (we had various names for them) could be thought of as time-varying infinite arrays.

The details were pretty straight forward. We would add space versions of the temporal operators *first*, *next*, *fby* etc. The programmer as before could define variables with arbitrary expressions involving these operators. Let’s call the space operators *initial* (corresponding to *first*), *succ* (successor, like *next*), and *sby* (succeeded by, like *fby*).

In imperative languages arrays are usually created by loops that update components one by one. You can emulate this in Lucid. You need an update operator that takes an array, an index, and a value, and returns an array like the old on except that the value is now stored at the index. However we realized this was kludgey and likely inefficient.

The ‘idiomatic’ way to do it is to define the whole array at once, like we do for streams. Thus

*nums = 1 sby nums+1*

defines the array of counting numbers and

*tri = 1 sby tri + succ nums*

the array of triangular numbers.

If we mix spatial and temporal operators, we can define entities that depend on both dimensions – that are time-varying arrays. Thus

*stot(A) = S where S = A sby S + succ A; end*

*P = 1 fby stot(P)*

gives us Pascal’s triangle (tilted to the left)

*1 1 1 1 1 …*

*1 2 3 4 5 …*

*1 3 6 10 15 …*

*1 4 10 20 35 …*

*1 5 15 35 70 …*

with the space dimension increasing to the right and the time dimension increasing towards the bottom.

The function *stot* is not defined recursively and we can eliminate it by applying the calling rule, giving

*P = 1 fby (S where S = P sby S + succ P end*)

and we can promote S to a global giving

*P = 1 fby S*

*S = P sby S + succ P*

The first equation implies that S is equal to next P and substituting in the right hand side of the second equation gives

*P= 1 fby S*

*S = P sby next P + succ P*

Now we can eliminate S to get

*P = 1 fby P sby next P + succ P*

So we can use the usual rules to transform two-dimensional programs. Even though there are two dimensions, the programs are still equations.

We can use the space dimension to generate primes without recursion. We define a stream of arrays in which on each time step the next array is the result of purging the current array of all the multiples of its initial element.

*N = 2 sby N+1*

*S = N fby S wherever S mod initial S ne 0*

*P = initial S*

This defines P to be the stream of all primes.

Finally here is a program that crudely approximates heat transfer in an infinite metal bar. The bar is initially hot (temperature 100) at the left end (initial point) and 0 elsewhere. Thereafter at every timepoint each spacepoint receives or gives a small percentage of the temperature difference with its neighbour.

*eps = 0.1*

*B0 = 100 sby 0*

*B = B0 fby 100 sby succ B + eps*(B-succ B) + eps* (succ succ B – succ B)*

The output shows the bar gradually warming up as the heat travels from left to right.

*100 0.0 0.0 0.0 …*

* 100 10.0 0.0 0.0 …*

* 100 18.0 1.0 0.0 …*

* 100 24.5 2.6 0.1 …*

*…*

The eduction process is easily extended to two dimensions. Instead of demands specifying a variable and a time coordinate, they specify a variable and a time coordinate and a space coordinate. These demands give rise to demands for possibly different variables at possibly different time and space coordinates.

There is one complication, however, involving the warehouse (cache). Some variables may be independent of one or more of the coordinates. For example, in the primes program above the variable *N* does not depend on the time coordinate. In principal, if we cache values of *N* tagged with both the time and space coordinates we risk filling the cache with duplicate values of *N* with the same space coordinates but with different time coordinates.

That doesn’t happen with the primes program because it works out that all the demands for *N* will have time coordinate 0, so no duplicates. Many programs are well behaved in this sense. But not all.

For example, in the heat transfer program, demands for *B* at different time points and space points will lead to demands for *eps* at different time- and space points. But these demands for *eps* at different contexts will all return 0.1, so if the results of these demands are each cached, we’re wasting time and space.

Avoiding this problem requires what we call *dimensionality* *analysis*. The dimensionality of a variable 𝒱 is the set of dimensions that it depends on; the least set of dimensions with the property that knowing the coordinates of these dimensions allows you to compute 𝒱.

If there are two dimensions *s* and *t* there are four possible dimensionalities:

*{}, {s}, {t}, {s,t}*

For example, of the dimensionality is *{t}*, we need to know the time coordinate but not the space coordinate.

In practice we can’t always compute the exact dimensionality because it could turn out e.g. that a complex looking expression always has the same value. But we can compute bounds on the dimensionalities and that’s almost always good enough.

I’ll leave dimensionality analysis to another post – it’s very similar to type inference. Applied to the prime program, for example, it finds that the dimensionality of *N* is *{s}*, of *S* is *{s,t}*, and of *P* is *{t}*. Applied to the heat program, it finds that the dimensionality of *B *is *{s,t}*, of *B0* is *{s}*, and of *eps* is *{}*.

This information allows us to cache and fetch the values of these variables with the minimum tags.

Notice that *P* and *N* are both one-dimensional – only one coordinate is required to compute a value. But they don’t have the same dimensionality. The *rank* of a variable is the number of coordinates needed. It is the cardinality of the dimensionality. Knowing the rank is not enough to tell you how to cache and fetch a value.

In future posts I’ll talk about what happens when there are a few more dimensions, or a lot more dimensions, or dynamic dimensions, or dimensions as parameters of functions.

]]>

The logician Willard Quine defined a paradox as an “absurd” statement backed up by an argument.

The famous result of Banach and Tarski definitely counts as a paradox by this definition. They proved that it is possible to take a unit sphere (a ball with radius 1), divide it into five pieces, then by rotations and translations reassemble it into *two* unit spheres.

Huh?

This would seem to be impossible, based on our experience of the physical world. What happened to conservation of volume? The original sphere had volume 4π/3, the five parts should have total volume 4π/3, but the two spheres have total volume 8π/3. Something doesn’t add up.

That’s literally true. Four of the pieces are so bizarre they don’t have a volume (technically, they are non measurable sets). Therefore you can’t add their volumes.

**Axiom of Choice**

I’ve said before that a paradox can often be understood as a proof by contradiction of one of the (often implicit) assumptions. One of the assumptions here is the additivity of volume. But the other is the *Axiom** of Choice*.

The Axiom of Choice (AC) seems harmless at first. It says that if you have a collection of nonempty sets, there is a single function (a “choice function”) that assigns to each set an element of that set.

This seems reasonable and in line with our experience. If you have a bunch of bags each with some candies in them, there is certainly no problem collecting one from each bag (a child can do it and will only be too happy to oblige). Even if the candies in each bag are identical.

Trouble happens when the number of candy bags is uncountably infinite. Why should there be a uniform way of making this infinite number of choices?

**Nonmeasurable sets**

This trouble takes many forms. The Banach Tarski paradox is just one. AC also (obviously) implies that there are sets that don’t have a volume (or area, or length).

The supposed existence of nonmeasurable sets seriously complicates analysis. (Analysis is, roughly speaking, generalized calculus.) Analysis textbooks are full of results which state that such-and-such a procedure always generates a measurable set. If students ask to see an example of one of these mysterious objects that don’t have a volume (or area, or length), the instructor is in trouble. AC tells you that such sets exist, but says nothing about any particular one of them. It’s *non constructive*.

In fact it can be shown that almost any set that is in any sense definable (say, by logical formulas) is measurable. For example, all Borel sets are measurable. If authors simply assumed that all sets are measurable, the average text would shrink to a fraction of its size. And they wouldn’t get into trouble – it is not possible, without AC, to prove the existence of a non measurable set.

**Determinacy**

More trouble arises when we deal with infinite games. Finite games of perfect information (no hidden cards) are well understood. If ties are impossible, then one player ‘owns’ the game – has a winning strategy. (A strategy is basically a complete playbook which tells you what to do in each situation.) Zermelo, the Z in ZF, first proved this. This is called determinacy.

When we move to infinite games (in which the players alternate forever) AC causes trouble. As you can guess, AC implies the existence of nondeterminate games, in which every strategy for player I is beaten by some strategy for player II, and vice versa. Strange. Needless to say, I can’t give you a concrete example of a nondeterminate game. Once again, you can prove that almost any particular game that you can specify is determinate.

**Infinite voting systems**

My final example of a counterintuitive consequence of AC is the *ultrafilter theorem*. To avoid nerdy formulas, I’ll describe it in terms of voting.

Let’s say we have a finite group of voters

*P _{1}, P_{2}, P_{3}, … , P_{n}*

and they each vote Aye or Nay on a resolution. When do the Ayes have it? Obviously, when they have a majority (let’s count ties as the Nays having it). No problem.

When there are infinitely many voters, however, it is not so obvious what to do. A vote can be thought of as an infinite sequence of Ayes and Nays, e.g.

*Aye, Nay, Nay, Aye, Nay, Aye, Aye, Nay, …*

What constitutes a “majority” of an infinite set of voters? You could give it to the Ayes if there are infinitely many of them, but it is also possible that at the same time there are infinitely many Nays, in which case the Nays have grounds for complaint.

It’s useful to make a list of the properties such a voting system should have.

- If the vote is unanimous, then the result should be the same, whether Aye or Nay
- No ties: either the Ayes have it (have a majority), or the Nays do
- If a vote is held and one person changes their vote, the outcome is unaffected.
- If a vote is held and the Ayes have it, and then any number of voters switch from Nay to Aye, the Ayes still have it
- the union of two minorities is a minority, and the intersection of two majorities is a majority

Sounds doable, but how?

We already saw that making all infinite sets majorities won’t work, because their complements may be infinite. In the same way we can’t say minorities are all finite. We can’t choose one or even finitely many people as the deciders, because individual votes don’t count.

Hmmm.

Well, don’t try to solve this because you won’t succeed. It can be shown, again, that there is no concrete (definable) scheme that works. In particular, even if we use Turing machines that can perform an infinite sequence of steps in a finite amount of time (this makes mathematical sense), there is no voting program.

And yet the Axiom of Choice tells us that there is a voting method (not obvious). But don’t ask what it is, it’s a rabbit that AC pulls out of its hat.

**The nature of existence**

What to do about this?

We can retain AC and just live with the absurd Banach-Tarski result, with sets without volume (or area or length), with games that have no winner, and infinite voting.

But in what sense does, say, there *exist* a voting method? AC tells us we are free to imagine that there exists a voting method. Gödel showed that AC is consistent with ZF (assuming, as everyone believes, that ZF is consistent). That means we won’t get into trouble if we use it. But many of its consequences are unsettling.

AC means, for example, that you can say “I know that there is a voting method that works” but not “I know a voting method that works”. Of course this situation happens in real life. But in real life there’s the possibility of resolving the situation. If you know there is a wolf in the woods, you can go into the woods and find it. No use going looking for the voting method because you’ll never find it.

**Other choices**

Can we do without AC? To a point, yes. There are weaker forms that don’t have unsettling consequences. One is Countable Choice (CC), that says that given an (infinite) sequence

*S _{1}, S_{2}, S_{3}, …*

of sets there is a sequence

*x _{1}, x_{2}, x_{3}, …*

with each *x _{i}* an element of

CC or DC is enough to do most practical mathematics, including most analysis. However it is not enough for important foundational theorems. For example, DC is not enough to prove the completeness theorem for first order logic. (Which says that every formula is either provable or has a counterexample.) For completeness, you need a voting method.

Another possibility is the Axiom of Determinacy (AD) which says that every game has a winner. It has some nice consequences, for example, it implies that every set of real numbers is measurable.

But it also implies that ZF is consistent. This sounds nice, too, but is actually a disaster. It means that we can’t prove the consistency of AD with ZF (assuming the consistency of ZF). In fact it is not known whether ZF+AD is consistent. Not safe for work!

**AC, I can’t quit you**

What to do? I’m afraid I don’t have the answer. AC causes trouble but it also makes life a lot simpler. For example, it implies that any two orders of infinity are comparable. Without AC, cardinal arithmetic is chaos. Set theorists have tried to come up with a weaker version of the Axiom of Determinacy but so far nothing persuasive has appeared.

In the end, it’s an engineering decision. If we choose AC, we have a well ordered mathematical universe with very nice features but also some bizarre objects with properties that contradict our real life experiences. A kind of Disneyland but with monsters. If we reject AC, we have a chaotic, complex universe in which the normal rules don’t apply. A kind of slum with broken windows, collapsing stairways, and cracked foundations. A “disaster” as Horst Herrlich put it.

And there doesn’t seem to be a middle ground. DC fixes some of the cracks and makes a large part of the slum (e.g. analysis) habitable, but doesn’t make it a theme park.

One possibility is to treat AC as a powerful drug and take it only when necessary. Theorems should come with consumer labels saying what went into them. So if you see a box on the shelf of “Banach and Tarski’s Miracle Duplicator! Feed Multitudes!”, it will say on the back of the box “Contains AC”.

]]>*This statement is false.*

If it’s true then it’s false, but if it’s false then it’s true … nothing works.

In my not-so-humble opinion, most (maybe all) paradoxes are the last step in a proof by contradiction that some unstated assumption is false.

In this case, the assumption is that the above statement is meaningful – is either true or false. The assumption is false, the statement is meaningless. End of paradox.

Of course, there’s more to it than that. Behind the Liar Paradox is a more general, and seemingly sensible assumption, that any statement that is syntactically correct is meaningful. Obviously, not the case. Here’s another example

*I do not believe this statement.*

If I believe it, then I don’t, and if I don’t, then I do.

It’s tempting to believe that self reference is the problem, but there are plenty of self referential sentences that are (or seem … ) meaningful and true; e.g. “I know this sentence is true”.

To get to the bottom of this we need to formalize the paradox. This was first done by the famous logician Alfred Tarski (in 1936). In his formalization, the problem is the phrase “is true”.

More than 80 years later you can explain it without getting too technical. Imagine we have a formal logical language with quantifiers, variables, Boolean connectives, arithmetic operations and (this really helps) strings and string operations. Call this language ℒ. At this stage everything syntactically correct makes sense. For example, we can state that string concatenation is associative, or that multiplication distributes over addition.

Since we have strings, we can talk about expressions and formulas *in the language itself. *We can define a predicate (of strings) that is true iff the string is a syntactically correct formula. We can define an operation “subs” that yields the result of substituting an expression for a variable; more precisely, subs(f,g) is the result of substituting g for every occurrence of x in f. So far, no problem. Can we produce a formula that refers to itself? Not yet.

Gödel numbers? No need. The whole point of Gödel numbering is to show that you don’t need strings, you can represent them as (arbitrarily large) integers. This is important but not particularly interesting. In modern computer science terms, it means *implementing* strings as (arbitrarily long) integers, and nowadays (but not in the 30’s) everyone believes this without seeing the details.

So far so good. One last little step … and we go over the cliff. The last step is to add a predicate T of strings that says it’s argument is a formula and that this formula is true (with free variables universally quantified). T seems harmless enough, but with it we can reproduce the Liar Paradox.

Provided we can make a sentence refer to itself. This, not Gödel numbering, is the tricky part.

Since ℒ+ has strings, subs, and T, we can talk about whether or not a formula is true of itself (as a string). If a formula is *not* true of itself (it ‘rejects’ itself) let’s call it *neurotic*.

To see that we can define neurosis, let’s say that a formula Φ is true of a formula Θ iff Φ is true when all occurrences of the variable x in Φ are replaced by Θ (as a string constant). If we call the result of this substitution Φ[Θ], then to say that Φ is true of Θ is to say that Φ[Θ] is true.

Then let Ψ be the sentence

*¬T(subs(x,x))*

It should be clear that Ψ says that its argument is neurotic. What about Ψ, is it neurotic? Is Ψ[Ψ] true or false?

On the one hand, if it’s false, then by definition of neurosis Ψ is neurotic. But since Ψ tests for neurosis, Ψ[Ψ] should be true. On the other hand, if Ψ[Ψ] is true, then since Ψ tests for neurosis, Ψ is neurotic. But this means by the definition of neurosis, Ψ is not neurotic. No way out. (You may recognize this as a variant of the “barber who shaves all those who don’t shave themselves” paradox.)

Thus Ψ[Ψ] is our liar sentence. I can tell you exactly what it is; it’s

*¬T(subs(“¬T(subs(x,x))”,”¬T(subs(x,x))”))*

and is, by my count, 41 characters long.

We can make the argument clearer (if not as precise) using our functional shorthand. We define Ψ by the rule

*Ψ[Φ] = ¬Φ[Φ]*

Then

*Ψ[Ψ] = ¬Ψ[Ψ]*

Those who are familiar with the λ calculus or combinatory logic will detect the Y combinator behind this argument. The combinator Y is, as a λ-expression,

*λF (λx F(x x))(λx F(x x))*

It’s called a fixed point combinator because YF reduces to F(YF); YF is a fixed point of F. The ISWIM (where -clause) version is much easier to understand:

*G(G) where G(x) = F(x(x))*

Working back from this contradiction, it means we can’t consistently add a truth predicate to our basic language ℒ. That in turn means that we can’t define T in ℒ, otherwise the ℒ would be inconsistent. That’s what Tarski meant when he called his result “the undefinability of truth”.

Can we salvage anything from this? Yes, and this is due to Tarski and Saul Kripke.

There is no harm in applying T to formulas that don’t use T, the meaning is obvious. Call the language allowing this ℒ ‘. Similarly, applying T to ℒ ‘ formulas is ok, call the language where this is allowed as well ℒ ”. We can create a sequence ℒ, ℒ ‘, ℒ ”, ℒ ”’, … (This is Tarski’s hierarchy).

We can throw these all together producing a language ℒ *. But then we can create ℒ*’, ℒ*” etc. Generalizing this we have a hierarchy indexed by the countable ordinals (don’t ask). Kripke’s proposal was to define a single language with a single truth predicate in which anything goes but in which sentences not caught up in this construction have an intermediate truth value. Thus Ψ[Ψ] would be neither -1 (false) nor +1 (true) but 0, with 0 being its own negation. I’ll let you decide whether this makes Ψ[Ψ] meaningful after all.

Sentences that have a conventional truth value Kripke calls *grounded*; those, like the liar sentence, *ungrounded*. You can think of the ungrounded sentences as those in which evaluation fails to terminate. Notice that “this statement is true” is ungrounded. (Kripke found a way around this but I won’t go into the details.)

Finally, infinitesimal logic can shed some light on groundedness. If we redefine T(f) to be the truth value of f *times 𝛆*, and evaluate over the infinitesimal truth domain

-1, -𝛆, -𝛆^{2}, -𝛆^{3}, … 0 … 𝛆^{3}, 𝛆^{2}, 𝛆, 1

then we get a more nuanced result. The power of the infinitesimal tells us roughly how many layers of truth predicate we have to go through to decide between true and false.

]]>