The logician Willard Quine defined a paradox as an “absurd” statement backed up by an argument.

The famous result of Banach and Tarski definitely counts as a paradox by this definition. They proved that it is possible to take a unit sphere (a ball with radius 1), divide it into five pieces, then by rotations and translations reassemble it into *two* unit spheres.

Huh?

This would seem to be impossible, based on our experience of the physical world. What happened to conservation of volume? The original sphere had volume 4π/3, the five parts should have total volume 4π/3, but the two spheres have total volume 8π/3. Something doesn’t add up.

That’s literally true. Four of the pieces are so bizarre they don’t have a volume (technically, they are non measurable sets). Therefore you can’t add their volumes.

**Axiom of Choice**

I’ve said before that a paradox can often be understood as a proof by contradiction of one of the (often implicit) assumptions. One of the assumptions here is the additivity of volume. But the other is the *Axiom** of Choice*.

The Axiom of Choice (AC) seems harmless at first. It says that if you have a collection of nonempty sets, there is a single function (a “choice function”) that assigns to each set an element of that set.

This seems reasonable and in line with our experience. If you have a bunch of bags each with some candies in them, there is certainly no problem collecting one from each bag (a child can do it and will only be too happy to oblige). Even if the candies in each bag are identical.

Trouble happens when the number of candy bags is uncountably infinite. Why should there be a uniform way of making this infinite number of choices?

**Nonmeasurable sets**

This trouble takes many forms. The Banach Tarski paradox is just one. AC also (obviously) implies that there are sets that don’t have a volume (or area, or length).

The supposed existence of nonmeasurable sets seriously complicates analysis. (Analysis is, roughly speaking, generalized calculus.) Analysis textbooks are full of results which state that such-and-such a procedure always generates a measurable set. If students ask to see an example of one of these mysterious objects that don’t have a volume (or area, or length), the instructor is in trouble. AC tells you that such sets exist, but says nothing about any particular one of them. It’s *non constructive*.

In fact it can be shown that almost any set that is in any sense definable (say, by logical formulas) is measurable. For example, all Borel sets are measurable. If authors simply assumed that all sets are measurable, the average text would shrink to a fraction of its size. And they wouldn’t get into trouble – it is not possible, without AC, to prove the existence of a non measurable set.

**Determinacy**

More trouble arises when we deal with infinite games. Finite games of perfect information (no hidden cards) are well understood. If ties are impossible, then one player ‘owns’ the game – has a winning strategy. (A strategy is basically a complete playbook which tells you what to do in each situation.) Zermelo, the Z in ZF, first proved this. This is called determinacy.

When we move to infinite games (in which the players alternate forever) AC causes trouble. As you can guess, AC implies the existence of nondeterminate games, in which every strategy for player I is beaten by some strategy for player II, and vice versa. Strange. Needless to say, I can’t give you a concrete example of a nondeterminate game. Once again, you can prove that almost any particular game that you can specify is determinate.

**Infinite voting systems**

My final example of a counterintuitive consequence of AC is the *ultrafilter theorem*. To avoid nerdy formulas, I’ll describe it in terms of voting.

Let’s say we have a finite group of voters

*P _{1}, P_{2}, P_{3}, … , P_{n}*

and they each vote Aye or Nay on a resolution. When do the Ayes have it? Obviously, when they have a majority (let’s count ties as the Nays having it). No problem.

When there are infinitely many voters, however, it is not so obvious what to do. A vote can be thought of as an infinite sequence of Ayes and Nays, e.g.

*Aye, Nay, Nay, Aye, Nay, Aye, Aye, Nay, …*

What constitutes a “majority” of an infinite set of voters? You could give it to the Ayes if there are infinitely many of them, but it is also possible that at the same time there are infinitely many Nays, in which case the Nays have grounds for complaint.

It’s useful to make a list of the properties such a voting system should have.

- If the vote is unanimous, then the result should be the same, whether Aye or Nay
- No ties: either the Ayes have it (have a majority), or the Nays do
- If a vote is held and one person changes their vote, the outcome is unaffected.
- If a vote is held and the Ayes have it, and then any number of voters switch from Nay to Aye, the Ayes still have it
- the union of two minorities is a minority, and the intersection of two majorities is a majority

Sounds doable, but how?

We already saw that making all infinite sets majorities won’t work, because their complements may be infinite. In the same way we can’t say minorities are all finite. We can’t choose one or even finitely many people as the deciders, because individual votes don’t count.

Hmmm.

Well, don’t try to solve this because you won’t succeed. It can be shown, again, that there is no concrete (definable) scheme that works. In particular, even if we use Turing machines that can perform an infinite sequence of steps in a finite amount of time (this makes mathematical sense), there is no voting program.

And yet the Axiom of Choice tells us that there is a voting method (not obvious). But don’t ask what it is, it’s a rabbit that AC pulls out of its hat.

**The nature of existence**

What to do about this?

We can retain AC and just live with the absurd Banach-Tarski result, with sets without volume (or area or length), with games that have no winner, and infinite voting.

But in what sense does, say, there *exist* a voting method? AC tells us we are free to imagine that there exists a voting method. Gödel showed that AC is consistent with ZF (assuming, as everyone believes, that ZF is consistent). That means we won’t get into trouble if we use it. But many of its consequences are unsettling.

AC means, for example, that you can say “I know that there is a voting method that works” but not “I know a voting method that works”. Of course this situation happens in real life. But in real life there’s the possibility of resolving the situation. If you know there is a wolf in the woods, you can go into the woods and find it. No use going looking for the voting method because you’ll never find it.

**Other choices**

Can we do without AC? To a point, yes. There are weaker forms that don’t have unsettling consequences. One is Countable Choice (CC), that says that given an (infinite) sequence

*S _{1}, S_{2}, S_{3}, …*

of sets there is a sequence

*x _{1}, x_{2}, x_{3}, …*

with each *x _{i}* an element of

CC or DC is enough to do most practical mathematics, including most analysis. However it is not enough for important foundational theorems. For example, DC is not enough to prove the completeness theorem for first order logic. (Which says that every formula is either provable or has a counterexample.) For completeness, you need a voting method.

Another possibility is the Axiom of Determinacy (AD) which says that every game has a winner. It has some nice consequences, for example, it implies that every set of real numbers is measurable.

But it also implies that ZF is consistent. This sounds nice, too, but is actually a disaster. It means that we can’t prove the consistency of AD with ZF (assuming the consistency of ZF). In fact it is not known whether ZF+AD is consistent. Not safe for work!

**AC, I can’t quit you**

What to do? I’m afraid I don’t have the answer. AC causes trouble but it also makes life a lot simpler. For example, it implies that any two orders of infinity are comparable. Without AC, cardinal arithmetic is chaos. Set theorists have tried to come up with a weaker version of the Axiom of Determinacy but so far nothing persuasive has appeared.

In the end, it’s an engineering decision. If we choose AC, we have a well ordered mathematical universe with very nice features but also some bizarre objects with properties that contradict our real life experiences. A kind of Disneyland but with monsters. If we reject AC, we have a chaotic, complex universe in which the normal rules don’t apply. A kind of slum with broken windows, collapsing stairways, and cracked foundations. A “disaster” as Horst Herrlich put it.

And there doesn’t seem to be a middle ground. DC fixes some of the cracks and makes a large part of the slum (e.g. analysis) habitable, but doesn’t make it a theme park.

One possibility is to treat AC as a powerful drug and take it only when necessary. Theorems should come with consumer labels saying what went into them. So if you see a box on the shelf of “Banach and Tarski’s Miracle Duplicator! Feed Multitudes!”, it will say on the back of the box “Contains AC”.

]]>

*This statement is false.*

If it’s true then it’s false, but if it’s false then it’s true … nothing works.

In my not-so-humble opinion, most (maybe all) paradoxes are the last step in a proof by contradiction that some unstated assumption is false.

In this case, the assumption is that the above statement is meaningful – is either true or false. The assumption is false, the statement is meaningless. End of paradox.

Of course, there’s more to it than that. Behind the Liar Paradox is a more general, and seemingly sensible assumption, that any statement that is syntactically correct is meaningful. Obviously, not the case. Here’s another example

*I do not believe this statement.*

If I believe it, then I don’t, and if I don’t, then I do.

It’s tempting to believe that self reference is the problem, but there are plenty of self referential sentences that are (or seem … ) meaningful and true; e.g. “I know this sentence is true”.

To get to the bottom of this we need to formalize the paradox. This was first done by the famous logician Alfred Tarski (in 1936). In his formalization, the problem is the phrase “is true”.

More than 80 years later you can explain it without getting too technical. Imagine we have a formal logical language with quantifiers, variables, Boolean connectives, arithmetic operations and (this really helps) strings and string operations. Call this language ℒ. At this stage everything syntactically correct makes sense. For example, we can state that string concatenation is associative, or that multiplication distributes over addition.

Since we have strings, we can talk about expressions and formulas *in the language itself. *We can define a predicate (of strings) that is true iff the string is a syntactically correct formula. We can define an operation “subs” that yields the result of substituting an expression for a variable; more precisely, subs(f,g) is the result of substituting g for every occurrence of x in f. So far, no problem. Can we produce a formula that refers to itself? Not yet.

Gödel numbers? No need. The whole point of Gödel numbering is to show that you don’t need strings, you can represent them as (arbitrarily large) integers. This is important but not particularly interesting. In modern computer science terms, it means *implementing* strings as (arbitrarily long) integers, and nowadays (but not in the 30’s) everyone believes this without seeing the details.

So far so good. One last little step … and we go over the cliff. The last step is to add a predicate T of strings that says it’s argument is a formula and that this formula is true (with free variables universally quantified). T seems harmless enough, but with it we can reproduce the Liar Paradox.

Provided we can make a sentence refer to itself. This, not Gödel numbering, is the tricky part.

Since ℒ+ has strings, subs, and T, we can talk about whether or not a formula is true of itself (as a string). If a formula is *not* true of itself (it ‘rejects’ itself) let’s call it *neurotic*.

To see that we can define neurosis, let’s say that a formula Φ is true of a formula Θ iff Φ is true when all occurrences of the variable x in Φ are replaced by Θ (as a string constant). If we call the result of this substitution Φ[Θ], then to say that Φ is true of Θ is to say that Φ[Θ] is true.

Then let Ψ be the sentence

*¬T(subs(x,x))*

It should be clear that Ψ says that its argument is neurotic. What about Ψ, is it neurotic? Is Ψ[Ψ] true or false?

On the one hand, if it’s false, then by definition of neurosis Ψ is neurotic. But since Ψ tests for neurosis, Ψ[Ψ] should be true. On the other hand, if Ψ[Ψ] is true, then since Ψ tests for neurosis, Ψ is neurotic. But this means by the definition of neurosis, Ψ is not neurotic. No way out. (You may recognize this as a variant of the “barber who shaves all those who don’t shave themselves” paradox.)

Thus Ψ[Ψ] is our liar sentence. I can tell you exactly what it is; it’s

*¬T(subs(“¬T(subs(x,x))”,”¬T(subs(x,x))”))*

and is, by my count, 41 characters long.

We can make the argument clearer (if not as precise) using our functional shorthand. We define Ψ by the rule

*Ψ[Φ] = ¬Φ[Φ]*

Then

*Ψ[Ψ] = ¬Ψ[Ψ]*

Those who are familiar with the λ calculus or combinatory logic will detect the Y combinator behind this argument. The combinator Y is, as a λ-expression,

*λF (λx F(x x))(λx F(x x))*

It’s called a fixed point combinator because YF reduces to F(YF); YF is a fixed point of F. The ISWIM (where -clause) version is much easier to understand:

*G(G) where G(x) = F(x(x))*

Working back from this contradiction, it means we can’t consistently add a truth predicate to our basic language ℒ. That in turn means that we can’t define T in ℒ, otherwise the ℒ would be inconsistent. That’s what Tarski meant when he called his result “the undefinability of truth”.

Can we salvage anything from this? Yes, and this is due to Tarski and Saul Kripke.

There is no harm in applying T to formulas that don’t use T, the meaning is obvious. Call the language allowing this ℒ ‘. Similarly, applying T to ℒ ‘ formulas is ok, call the language where this is allowed as well ℒ ”. We can create a sequence ℒ, ℒ ‘, ℒ ”, ℒ ”’, … (This is Tarski’s hierarchy).

We can throw these all together producing a language ℒ *. But then we can create ℒ*’, ℒ*” etc. Generalizing this we have a hierarchy indexed by the countable ordinals (don’t ask). Kripke’s proposal was to define a single language with a single truth predicate in which anything goes but in which sentences not caught up in this construction have an intermediate truth value. Thus Ψ[Ψ] would be neither -1 (false) nor +1 (true) but 0, with 0 being its own negation. I’ll let you decide whether this makes Ψ[Ψ] meaningful after all.

Sentences that have a conventional truth value Kripke calls *grounded*; those, like the liar sentence, *ungrounded*. You can think of the ungrounded sentences as those in which evaluation fails to terminate. Notice that “this statement is true” is ungrounded. (Kripke found a way around this but I won’t go into the details.)

Finally, infinitesimal logic can shed some light on groundedness. If we redefine T(f) to be the truth value of f *times 𝛆*, and evaluate over the infinitesimal truth domain

-1, -𝛆, -𝛆^{2}, -𝛆^{3}, … 0 … 𝛆^{3}, 𝛆^{2}, 𝛆, 1

then we get a more nuanced result. The power of the infinitesimal tells us roughly how many layers of truth predicate we have to go through to decide between true and false.

]]>

Basically I said that Gödel’s results proved that no fixed set of facts and rules can on their own form the basis of mathematical knowledge. I said that hard-earned experience is indispensable. That mathematics is ultimately an experimental science. (This is not the usual take on Gödel’s work.)

But grammar? For natural languages, it’s the same story. Forget about semantics (meaning). Just the syntax of a natural language like English is infinitely rich and can’t be described by any manageable set of facts and rules. The same goes (sorry) for Go-the-game-not-the-programming-language. To master them you need judgement and experience.

Does this mean facts and rules are not as important as we might think? Actually no, they’re indispensable. In fact they are a vital part of what makes us human!

Formal grammars were invented by Chomsky and, independently, the Algol committee, to specify Algol and describe natural languages. They worked spectacularly well for Algol and got off to a good start for natural languages.

For example, one rule that covers a lot of sentences in English (and other languages) is

*<sentence> ::= <noun phrase> <verb phrase> <noun phrase>*

But already you have trouble because in many languages the verb phrase has to agree with the noun phrase in terms of number. So you need two rules

*<sentence> ::= <singular noun phrase> <singular verb phrase> <noun phrase>
<sentence> ::= <plural noun phrase> <plural verb phrase> <noun phrase>
*

In Russian (in the past tense) the verb phrase has to agree with the noun phrase in terms of gender (there are three in Russian). Six rules.

Let’s stick to English and concentrate on noun phrases. One big thing to deal with is the definite article “the”. Native speakers don’t think about it, but there are rules for “the”. For example, it does not precede personal names, like “John” or “Alison”. Or names of organizations, like “IBM”. Oh wait, what about “The Government” and “the BBC”? Hold on, you don’t say “the NBC” … ???

I have no idea what the rules are. I’ve had many students whose native language (Chinese, Farsi, Korean, … ) has no definite article. I often have to correct their usage and it seems they are always coming up with new ways to get it wrong.

So there seems to be a kind of incompleteness phenomenon here. No matter how many facts and rules you discover, there’s always a sentence that is idiomatic but not covered by these facts and rules.

This is what torpedoed early AI efforts in natural language processing. It was based on grammar and logic and failed because you never had enough facts and rules.

The first efforts at playing games like Chess or Go were also based on facts and logic. The main rule is that the value of a position for one player is the negative of the value of the least favourable (for the other player) position arrived at in one move (whew!).

That rule, and a whole bunch of facts about who wins a terminal position, in principle is enough. But not in practice.

So instead you need heuristic rules to evaluate positions. (In Chess, having passed pawns, controlling the centre, material superiority etc etc). IBM managed to make this work for Chess but for Go it was hopeless. Too many possible moves, too much context to take into account.

And yet there is AlphaGo, which has beaten the world champion. How does it work?

I don’t know. It uses neural networks to process hundreds of thousands of professional games and millions of games it plays with itself. The only facts and rules that humans give it are (as I understand it) the rules of the game. Maybe not even that – the explanations of AlphaGo are vague, probably because of commercial secrecy.

However, I think I can explain the success of AlphaGo (and, recently, Google translate) by an appeal to human psychology. Specifically, to the notions of *conscious* and *unconscious*.

It’s generally agreed that the brain works in both conscious and unconscious modes. Most of the processing is in the unconscious mode and we are (needless to say) unaware of it. How does the unconscious work? Not clear, though it may involve thrashing out contradictory tendencies.

The unconscious communicates with us through feelings, intuitions, hunches, judgement, perception, aesthetics, reflexes …

Anyone who has taken Go seriously will be amazed at how experts talk about the game. They use concepts like *strength*, *thickness*, good and bad *shape*, even (I’m not making this up) *taste*. Teachers encourage their students to play quickly, relying on instincts (reflexes). Learning Go is not so much about memorizing facts and rules as training your unconscious. Maybe AlphaGo works the same way, by simulating an unconscious and training it.

What then is left for the conscious? Guess what – facts and rules.

I’m convinced that the conscious, rational part of the mind works in a machine-like fashion, using and manipulating facts and rules, devising and following step-by-step protocols (algorithms). This is either an important insight or a banal observation and i’m not sure which.

I’m not saying that people are machines. We rely on our unconscious, which apparently does not work sequentially. The conscious and unconscious work together and make a great team. Only if you consciously ignore your feelings do you become a soulless robot (though there’s a lot of that about).

For example, in mathematics we first discover facts and rules by insight based on experience. Once we’ve found some we have confidence in, we then consciously apply them, draw consequences through step-by-step reasoning and leap way ahead of what we could discover by experience alone.

It’s this teamwork that give us such an advantage over animals, who act almost completely unconsciously. It’s what makes us human. It gives us the freedom to choose between doing what we feel like doing – or, if it’s not the same thing, doing what is best. It gives us free will.

]]>

This result, known as Gödel’s Theorem, has a lot of formal and informal consequences. It means there is no computer program that can infallibly decide whether or not a statement about arithmetic is true or false. It means we will never know everything about arithmetic, though we may know more and more as time goes on. It means, however, that this knowledge will not come about purely as a result of manipulating formal facts and rules. We will have to rely on other sources, including experiment.

Even more interesting is the fact that this situation – the limits of facts and rules – reappears in other domains, including games, natural language, and even psychology.

Experiments? What can mathematicians learn from experiments? Experiments aren’t useless, they can, for example, lead to conjectures. But unless these conjectures are proved, how can they contribute to mathematical knowledge?

It all depends on what you mean by experiment. Almost all conventional mathematics can be done in an axiomatic system called Zermelo Fraenkel set theory, (ZF), sometimes with the Axiom of Choice (ZFC). (If you want details consult Wikipedia). It’s obviously crucial that the facts and rules of ZFC be consistent (non contradictory). Otherwise every statement (and its opposite) can be formally derived.

Yet Gödel’s results implie that the consistency of ZFC cannot be proven in ZFC; in other words, the consistency of ZFC is not a theorem of conventional mathematics. Nevertheless we believe it, because we use ZFC. People win prizes and piles of money for proving things in ZFC. If ZFC were inconsistent, this would be money for old rope. So it’s safe to say that mathematicians strongly believe that ZFC is consistent.

But believing is not knowing, you say. The consistency of ZFC is not real mathematical knowledge. But what about all the results proved from the facts and rules of ZFC? They’re all tainted because they all assume consistency. So, strictly speaking, we cannot say we “know” that the four colour theorem is true even though there is now a proof. Strictly speaking, we only believe it.

It’s tempting to take refuge in simpler systems, like Peano Arithmetic (PA). PA consists of a handful of simple rules, basically the inductive definitions of the arithmetic operations. To these we add the principle of mathematical induction: to prove P(n) for all n, prove that P(0) is true, and prove that P(n+1) is true assuming P(n) is true.

All sensible and reliable. But how do we *know* that mathematical induction works?

In short, we know that induction is valid because (1) it makes complete sense, and (2) it has never let us down. In other words it *feels* right and in our (extensive) experience works *in practice*.

This last paragraph is not formal mathematics. We are invoking judgement and experience.

Nevertheless I would argue that we have the right to say we “know” (not just believe) that induction is valid. Because we believe with at least the same degree of certainty that the law of gravity is valid. And for the same reasons – judgement and experience. The law makes sense and has never let us down. We *know* it.

For that matter, how do we know that x+y=y+x is valid or even that 23+14=37 is true? By insight and experience. (These are not obvious to kids learning arithmetic.)

In other words, mathematics, like physics, has an empirical element. ZFC has been around for about a century and we can consider the experience of using it as a big experiment. The hypothesis is that ZFC is consistent and in a century of intensive use no contradiction has shown up. Hypothesis confirmed!

There are other formal systems of set theory that are strong enough to do mathematics. One is Gödel-Bernays (GB) which has the advantage that there are only finitely many axioms (ZF has axiom schemas). But GB is equiconsistent with ZF (and ZFC): if one is consistent they all are. So we can use it with confidence.

There is another system , namely Quine’s New Foundations (NF). It’s not known to be equiconsistent with ZFC so the results of the century-long experiment don’t necessarily apply. NF makes sense and hasn’t produced a contradiction, but we don’t have nearly the same experience with NF as we have with ZFC. This means we can’t have nearly the same confidence in results obtained using NF that we do in results obtained using ZFC.

OK, but what does all this have to do with games, natural language, and psychology? Well, this is where is gets interesting … but look at that word count! This post is already too long.

I promise to take this up in the next post, which will be real soon. But think about facts and rules vs feelings and experience and you can probably figure it out for yourself.

]]>

The MHC has a big brother, the Hybrid Predicate Calculus (HPC), which (apparently) has the power of full predicate logic. But at a certain point, it gets weird!

The basic idea is simple enough, you expand MHC by allowing property constants to have extra arguments (still on the left). For example, to say that Socrates (s) likes Plato (p) you write

spL

Notice the verb comes last – the HPC is an SOV language, like Japanese. (This means the typical simple sentence has a subject, and object, and a verb, in that order).

As we saw, the MHC has expressions that correspond to natural language quantifier phrases, such as “All Greeks”. The HPC has them too, and they can be used like nouns. Thus

[G]pL

says that all Greeks like Plato.

The HPC allows partial application – a relation constant (like L) does not have to have a full set of arguments. Thus L on its own denotes the liking relation, pL denotes the property of liking Plato, and thus spL can be understood as saying that Socrates has this property. In other words, that Socrates likes Plato.

Since pL is a property, we can form the quantifier phrase [pL], which clearly means “everyone who likes Plato”. Thus

[pL]aL

says that everyone who likes Plato likes Aristotle.

In this way we can nest brackets and say things that in conventional logic require nested quantifiers. 〈A〉L is the property of liking some Athenian. [〈A〉L] therefore means “everyone who likes some Athenian” and

says that anyone who likes some Athenian likes Socrates.

The students and I managed to say some pretty complex things without bound variables. One of my bonus questions was

Every student registered in at least one course taught by professor Egdaw is registered in every course Pat is registered in.

We should use R as the registered relation, T as the teaches relation, e as professor Egdaw, and p as Pat.

However, we immediately run into a problem: how to express the property of being a course taught by professor Egdaw. eT doesn’t work, it’s the property of teaching professor Egdaw. What we need is the taught-by relation, the converse of T. There is no way of doing this with what we have. But there’s an easy fix: add the converse operator. We denote it by the (suggestive) tilde symbol “~”. In general, ~K is the relation K with the first two arguments swapped, and in particular ~T is the taught-by relation (“~” often translates the passive voice).

We can now proceed in stages. e~T is the property of being taught by professor Egdaw and 〈e~T〉 is “some course taught by professor Egdaw”. 〈e~T〉R is the property of being registered in some course taught by (the brilliant) Egdaw, and [〈e~T〉R] is “every student registered in a (at least one) course taught by professor Egdaw”.

Now for the second main quantifier phrase. p~R are the courses Pat is registered in, and [p~R] is “every course pat is registered in”. We simply put the two phrases one after another followed by R and get

[〈e~T〉R][p~R]R

I used to think this was hard but actually it’s pretty straight forward. It’s interesting to compare it with the first order logic formalization

∀s((∃c R(s,c) ∧ T(e,c)) →∀c (R(p,c) → R(s,c)))

Of course, it was not a a good sign that we had to introduce a new feature (the converse operator). Will other examples require other features? How far will this go?

Just a bit further. We run in to another problem if we try to say “Plato likes all Athenians who like themselves”. How do we express the property of liking oneself?

Again, impossible with what we’ve got. We have to introduce a sort of self operator /. /K is like K except the first argument to K is duplicated. /L is the property of liking yourself and the expression we’re looking for is

p[A∧/L]L

We need one more operator that has the effect of ignoring an argument. We use “*” and *K is like K except *K ignores its first argument, its second argument is the first given to K, its third argument is the second given to K, and so on. We’d need it to express e.g. the relation of Spartans liking Athenians. The expression *S∧A∧L denote the relation that says that its second argument, who is Spartan, likes its first argument, who is Athenian.

Perhaps these equivalences will help

cba~K ↔ cabK

cba/K ↔ cbaaK

cba*K ↔ cbK

You can think of these three operators as replacements for things you can do with arbitrary use of variables. First, variables can be out of order, hence ~; variables can be duplicated, hence /; and variables can be omitted, hence *.

The only problem is that these three operators provide only simple cases of swapping, duplication, and omission. What if you need, say, to swap the third and second arguments, duplicate the third or omit the fourth? Don’t you need whole families of operators?

Not quite; instead, we add a meta operator that generates these families. In general, if ⊗ is an operator then ⊗’ is the operator that works with the indexes of the arguments shifted by one. Thus ~’ swaps the third and second arguments, /’ duplicates the second argument, and *’ omits the second arguments and shifts the higher ones. This gives the equivalences

dcba~’K ↔ dbcaK

dcba/’K ↔ dcbbaK

dcba*’ ↔ dcaK

The meta-operator ‘ can be iterated (e.g. /”’) and can also be applied to any quantifier phrase.

I’m pretty sure that this is enough, that anything that can be said in first order logic can be said in HPC. The only problem is, sometimes the result looks like a dog’s breakfast. The experience has been that assertions that can be expressed simply in natural language can be expressed simply in HPC, but that more technical statements (like the axioms of set theory) are often incomprehensible.

There’s more to the story. For example, HPC can be easily understood in terms of operations on relational tables. But I’ll leave that to a future post.

]]>

I use the blackboard. I hate powerpoint, as do many students. For one thing it’s a lot of preparation work. Also it’s too easy to present way too much information. Click, click, click, each slide crammed with information. The blackboard slows you down to just the right pace.

Part of what got me thinking about video was a mainly good experience with still photography. I would put an effort into writing clearly and laying the blackboard panes out neatly, then I would take pictures. I would post them on line and thus the students had lecture notes.

What should have tipped me off is my discovery that even taking still photos of blackboards is not straightforward. The naive approach is to stand right in front of the board, point the camera at it, and press the shutter.

Two things can go wrong, depending on what happens next. If the flash doesn’t go off, chances are there won’t be enough light on the board. Your camera (which of course you’ve set on automatic) uses a long exposure and the image is blurred.

On the other hand, if the flash goes off, there’s a big bright spot in the center of the photo where the flash reflects off the board. Nothing in the middle of the board is readable and photoshop won’t fix it.

Experienced photographers know how do it properly. One solution is to set up a remote flash that illuminates the board off centre. However I didn’t have one and didn’t want the hassle of setting it up for every class.

The other solution is us a tripod (and no flash) but again I didn’t want the hassle of hauling equipment to class and setting it up.

Finally I came up with a third solution: take the picture (with flash) from just on the side, and a few steps back. In the resulting photo there is no glare spot, though the board is distorted. Fix the distortion with photoshop skew, and you’re in business. A bit if extra work, but worth it. I used this for several classes over many terms.

**On to video**

Encouraged by my still photography experience, I decided to move on to video. All I needed was a camera and (unavoidably) a tripod. I already had a tripod and borrowed a video camera from a friend (who used it to record things like birthdays).

I brought them in to class set them up, aimed the camera and turned it on. I began lecturing as usual … couldn’t wait to see the result.

Which was awful. The image was low definition (probably 480p) and was useless because you couldn’t read the blackboard.

OK, I need a camera that can take high definition movies. I settled on a Canon digital SLR – a Rebel T1i (the series is now up to the T6i). I bring it to class, set it up, aim it, turn it on, lecture, then view the final result.

Which is OK till about half way through the lecture, when the video stops in mid sentence. Huh? I didn’t turn it off!

Out of desperation I consulted the manual and soon found the problem: it can only take 30 mins of video at a time, something to do with buffers filling up (it doesn’t matter how much the card will hold). After 30 minutes it stops recording. And the worst part is, it does so silently, without the slightest beep. There is no remedy and all the comparable cameras work the same way.

Also, the results weren’t that great. I had to place the camera far enough back that both blackboard panels appear. That meant they were both small and hard to read and most of the image was wasted.

So it was just not practical to simply set the camera up and let it run for an hour.

**Plan B**

The backup plan that emerged was to move the camera up to one of the blackboard panels (usually the left one) and have it fill the viewfinder. Also, I got a remote control for the camera so I could turn it off and on without leaving the blackboard. The idea was to record not the whole lecture, just highlights a few minutes long each.

This worked pretty well. No danger of the camera switching off, and the writing on the board was clearly readable.

I didn’t want to go back and forth pointing the camera at each of the two panels in turn. No need – the classrooms I was in had sliding panels. So I’d work on one panel and film, then turn the camera off, slide the right panel to the left, then start lecturing and filming again.

The trouble began after the filming. I had to stitch together these short filmlets. Not hard – though I had to learn iMovie to do it.

The next step is to actually watch the resulting movie, and if you follow in my footsteps you’re in for a shock.

The first shock is seeing yourself as you really are. You may look fatter than you imagine, or older than you imagine. Maybe you notice a sort of smug smirk you didn’t know you had, or some annoying mannerism (none of this applies to me, of course). Unfortunately iMovie can’t help here – it’s not photoshop. In time you’ll get used to you (or not; many people don’t even want to start filming for fear of what they’ll see).

**Yawn**

The second shock is realizing just how boring your lecturing can be. For example, perhaps you go way too slow, or give too much detail. But you can fix this.

One seemingly unavoidable source of boredom is writing on the board. I said earlier that the blackboard slows you down and this is a good thing. Not always.

Suppose I’m doing a logic course and want to deal with the resolution method. I announce that we’re going to look at the “Resolution Method” (so far so good). The next thing I do is turn to the blackboard and start writing RESOLUTION METHOD on the board. Except it seems in the movie to take about a week.

R – E – S (clack clack clack) O – L – U (clack clack clack) …. H – O – D

(“clack” is the sound of chalk hitting the board).

And all the time what are you looking at? My butt.

Here, iMovie can help. You cut out most of the clack-clack and crossfade stick the ends together. Then it’s magic. I announce Resolution Method to the class, turn to the blackboard, raise my chalk, clack … whoosh … RESOLUTION METHOD appears instantaneously, and I turn back to the audience.

In the end the videos began to look quite slick – you can see one of the more popular ones here.

**Epilogue**

So where are all these slick videos? I left some on youtube, you can find them by searching for “billwadge”. But there were never that many.

The reason is that once the fun wore off, I realized that it was all a lot of work. Hauling the equipment to class, setting it up, sliding the panels, turning the camera on and off, editing in iMovie – worse than preparing powerpoint.

My conclusion is that it’s not practical without help. Ideally, you need a camera operator if not two cameras and two operators. Plus someone to do good video editing. Plus maybe some good lighting.

Anything less is too much work for the poor lecturer and not helpful enough to be worth the effort.

[FADE TO BLACK]

]]>

I like true/false exam questions and through my career have thought up hundreds of them. Every now and then, for comic relief and to inflate the grades, I include some that are ridiculously easy. However, I’ve never found one that is so ridiculous that everyone gets it right. I always had a few takers. Here are some of my favourites.

This one is straight out of a Dilbert Cartoon (which I didn’t show them)

A database manager would be a fool to ignore the Boolean anti-binary least square approach.

It’s the details that make this one believable – to a significant percentage of students.

In 1983 six AI researchers at MIT were injured in a large combinatorial explosion.

They all know that AI has made great progress. Was solving tictactoe one of the first breakthroughs?

Tictactoe was completely solved in the 50’s

But some obstacles remain

The “Eight Queens” puzzle is a famous open problem of AI.

They hear about women pioneers of computer science, but prejudices remain.

The first programmer is generally considered to be Adam Lovelace, Babbage’s collaborator and son of Lord Byron.

Everyone loves a rags to riches story.

The inventor of the C language called it that because that was the grade he got in a class project (which was a first draft of the C specification).

Everyone has heard about cyberspace.

Cyberspace is the area south of San Francisco where many high-tech firms are located.

Email is really fast (in fact the message would travel a few feet). But “modern” and “fibre” add to credibility.

On modern fibre networks an email message can travel halfway around the world in only a few nanoseconds.

Why not? We’re talking about the “latest” chips.

The latest INTEL chips have quantum processors.

Don’t underestimate old technology.

COBOL stands for “Common OBject Oriented Language”.

A winning strategy may be a winning strategy, but the opponent is using “advanced” techniques.

A player using a winning strategy in a chess-like game may lose to an opponent who uses advanced machine-learning techniques.

They know “Colossus” has something to do with Turing

Alan Turing was so smart his colleagues called him the “Colossus”.

I almost believe this.

The tiny grooves on microchips that hold the connecting wires are called “Silicon Valleys”

They would be fools.

No commercial computer manufacturer would base their software on a forty year old operating system.

Talk about modesty.

AI proponents are known for being very cautious in their optimism.

Maybe this isn’t so ridiculously easy. You need a few steps of logic to realize this means any file can be reduced without loss to under one megabyte. But there’s that word “modern”.

Using modern lossless compression techniques, any file greater than one megabyte can be compressed by at least 10%.

I wince every time I read this.

AI suffered a serious setback in the early 70’s when a number of researchers had their grants cut off.

An enigmatic colossus.

Alan Turing was so eccentric his colleagues called him the “Enigma”.

My all-time favorite.

Al Gore contributed so much to the growth of the internet that computer scientists named the concept of algorithm after him.

]]>

Can this be fixed? Yes, but you have to be careful.

Two higher order systems appeared a while ago – HiLog and λ-Prolog. Both are useful but they have the same flaw. They are *intensional*. They have (say) predicate variables, but these variables range not over predicates in the mathematical sense, namely arbitrary sets of ground terms. Instead they range over *names* of predicates, names that appear in the program (so that higher order clauses are context sensitive). In logic terminology, they are not *extensional*.

Intensionality / context sensitivity can cause a lot of problems. It interferes with modularity. For example, if you have two sets of clauses that logically appear to do the same thing, it is not necessarily safe to swap them in the context of a program. And adding clauses to a program might break it, because it might change the scope of predicate variables.

One symptom of these programs is that the languages don’t have a minimum model semantics. So what? So: it means clauses don’t have a logical reading. You can’t understand them as logic, and as I once raised eyebrows at a conference for saying,

Logic Programming – Logic = Programming

Fortunately, a while back I discovered a subset of extensional higher order Horn logic that works as logic programming and has a minimum model semantics. The key idea is to restrict what can appear on the left hand side.

To see the problem with unrestricted higher order extensional Horn logic, consider the following clauses

p(a).

q(a).

r(p).

q(b) :- r(q).

What is the result of the query *:- q(b)*. ?

At first sight, *p* and *q* are both true of *a* alone. They both denote the set {a}. They have the same *extension*, and should be interchangeable. Since the system is extensional, and *r* is true of *p*, *r* must be true of *q*. But then *p* and *q* are no longer extensionally equivalent, and there is no reason for *r(q)* to succeed. Thus *q(b)* fails, but then *p* and *q* are equivalent …

There are two solutions to this paradox. One is to make *r(p)* succeed, and *r(q)* fail, even though *p* and *q* have the same extension. This is what the intensional systems do.

The alternative, which I presented way back in 1991, is to forbid rules like *r(p)* which single out particular predicates for special treatment. It is too much to ask an implementation to identify other predicates as having the same extension as the one in the spotlight.

So I disallowed rules in which predicate constants (and other non ground expressions) appear in the head. I also disallowed rules in which a higher order variable is repeated in the head, because that implies an (uncomputable) equality test. (Here “higher” means >0).

The result is that the higher order variables are basically formal parameters, and the clause takes the form of a definition.

These definitions can be powerful. For example, suppose you want to check whether a list is in numerical order. In first order Prolog, it’s easy:

numordered([]).

numordered([X]).

numordered([X,Y|T]) :- X<Y, numordered([Y|T]).

Fine, except suppose you have some lists of strings and you want to check if they are alphabetically ordered. You have to write code for another predicate *alfordered*. Another three lines, identical except that *X<Y* is replaced by *alf(X,Y)*. Then you have lists of lists that should be ordered (by subset) as if they represented sets. Another predicate, *setordered*. More cut and paste.

By now the functional programmers are laughing. They write one set of axioms, for a function with a binary relation argument.

Well, we can do the same in Definitional Higher order Prolog (DHP):

ordered([],R).

ordered([X],R).

ordered([X,Y|T],R) :- R(X,Y), ordered([Y|T],R).

Then we use ordered with appropriate arguments: *ordered([5,6,…],<)*, *ordered([‘dick’,’tom’,…],alf)* or *ordered([[5,3],[3,4,5],…],sub)*.

Another example is the join operation (on two binary predicates, yielding a third, their join).

join(P,Q)(X,Y) :- P(X,Z),Q(Z,Y).

DHP has been implemented (with some syntactic variations) as part of the system HOPES developed by Angelos Charalambidis in his PhD dissertation at the University of Athens, Greece. Currently Angelos and his group at the Demokritos Institute (again, Athens) are working on adding negation.

And now its time to have a laugh at the functional programmers expense. In their grim regimented world, function can be used in only one way. In logic programming no parameters have fixed roles. For example, we can define a relation *r* on *a*, *b*, c and *d*:

r(a,b). r(a,c). r(b,d). r(c,d).

and then ask the query

ordered(L,r)

and get all the lists that are in order according to L. Starting with the empty list and proceeding through singletons we eventually get [a,b,d] and [a,c,d].

Even more impressive, we can query *ordered([a,b,c,d],R)* and get the equivalent of

{(a,b),(b,c),(c,d)}

What does this mean? It’s not the only value for *R* that does the job, you can add any tuples you want to this set and still get an order that satisfies the query. This is true even if the tuples contain atoms other than *a*, *b*, *c*, and *d*. So there are in fact infinitely many solutions.

The reason the interpreter displays the one given above is because it is *minimal*. No subset of it works. In general, there may be more than one minimal solution, so they are presented sequentially, but in an unpredictable order.

The intentional systems can’t do this because they don’t consider *R* as ranging over all predicates, just over the ones named in the system. So if you haven’t already defined a predicate that does the job, you’re out of luck.

(Incidentally my original proposal didn’t allow this. I thought it was complex and unusual. Boy was I wrong. My successors corrected my mistake.)

As a slightly more interesting example, suppose we have a collection of facts about some musicians, say *singer(pam)* or *drummer(rajiv)*. We want to put together a band, with the constraint that a band must have a singer, a keyboardist, and either a bass or a drummer. Being a band is a second order predicate, a predicate of predicates. We can axiomatic it with

band(B) :- B(X), singer(X), B(Y), keyboard(Y), B(Z), rhythm(Z).

rhythm(Z) :- bass(Z).

rhythm(Z) :- drum(Z)

Then if we present the query *band(B)* the implementation will start producing lasts of band. And not many, because the bands presented will be *minimal*.

This feature really comes into its own when combined with negation. Instead of a band, think of a development teams. We could have all sorts of complex criteria, say that Keisha and Andre can’t both be on the team, that we need either a Javascript or PHP expert but nor both, that they have a language in common, and so on. The interpreter will produce a list of minimal teams.

(HOPES is available on github. Ironically, the implementation is written in Haskell.)

]]>

Only in engineering can it have a positive connotation – for example, a redundant duplicate backup system can be a good idea for safety and reliability. It’s repetition but it’s not useless or wasted.

In fact redundancy is widely used in practice and duplication/backup is only the simplest form. I discovered this trying to make sense of a multi-topic introductory computer science course.

A while back my UVic colleague Mary Sanseverino and I were both teaching this course and we were looking for unifying themes. Some were obvious – levels of abstraction, modularity, iteration vs recursion. One surprising theme that popped up was redundancy.

For example, redundancy played a big role in the design and operation of ENIAC, the first modern computer. The ENIAC had 18,000 vacuum tubes and the conventional wisdom was that a device this large would fail too regularly to be useful. The design of the circuits was redundant and the tubes were operated (in terms of voltage etc) well below their official ratings. Testing and preventive maintenance also reduced failures. So did keeping the machine running continuously – most tube failures occurred during power up/down.

The most striking example was the power supply. The mains power drove an electric motor that powered a generator! (This smoothed out fluctuations).

What notion of redundancy is this? A general one, that Mary S. and I came up with. Namely

devoting more than the bare minimum of resources to achieve a better result

For example, ENIAC worked connected directly to the mains but was more reliable with the redundant motor/generator pair.

We didn’t have to look much further to find other examples of this kind of redundancy. In the digital logic gates they had AND, OR and NOT, even though either of the first two can be computed using the remaining two. The instruction set was similarly redundant, with for example an add operation, a subtract operation and a negation operation.

Successors of ENIAC copied its redundancy and introduced even more with assembly language. A second programming language (after machine language) is already redundant, since machine language is in principle enough. Symbolic names are redundant, as are symbolic addresses. So is the need to declare all symbolic addresses used.

Then came the high level languages – many of them, a clear case of redundancy. High level languages themselves are highly redundant, the general rule being the higher the level, the more redundant.

For example, they typically have **for**, **while** and even **until** constructs even though **while** is enough. Variables and their types must be declared. The same goes for the number and type of arguments of procedures/methods. Constructs are closed by keywords (such as **endif** or **endcase**) even though in principle a generic **end** would be enough. Typically every **case** construct must have a **default** branch even if the other cases cover all the situations which will arise.

One especially clear example is the **assert** statement of, say, Python. An **assert** statement checks that something that should be true, is, and therefore normally contributes nothing. In general many forms of redundancy involve making sure or at least checking that something that shouldn’t happen doesn’t happen.

The original ENIAC was a general purpose programmable computer even though it was designed for computing ballistic tables, for which a simpler design would have sufficed. The extra power/generality was redundant. (Ironically, they eventually added redundant special-purpose stored programs for ballistics).

In what sense does this redundant generality give a “better result”? In the sense that the device (or whatever) can be used for purposes not anticipated. (Alan Kay once said that this property is a hallmark of good design.) Personal computers continued this tradition – they were general purpose even though they were designed for playing games and storing recipes. The redundancy really paid off when the web was invented. Nobody anticipated the web but everything was in place to implement it.

It should be obvious that both software and hardware (I haven’t even mentioned caching) embody multiple layers of redundancy. Our systems would be useless without them.

What about real life? Can we use redundancy in real life? We can and do (think: copilots) but not as much as we could, because redundancy requires extra resources. As a general rule, in today’s society, resources are chronically scarce.

For example, I’ve often thought that courses would be better if given by a pair of instructors. I don’t mean “team teaching” where two instructors take turns giving lectures. I mean two instructors in the class at the same time.

A dumb way of using the second instructor is to have him/her sitting in the corner waiting to step in if the first falls ill. We can do much better than that!

The second instructor could operate the powerpoint, wipe the blackboard, or (more challenging) circulate in the classroom answering student questions. The two could could hold dialogues, question and answer sessions, even argue. Each could watch and correct mistakes made by the other and intervene if they think the class is not following. In dealing with important points the second instructor could give a second, different explanation (redundant, of course).

Naturally the two would frequently swap roles (not strictly necessary and therefore also redundant).

A dream, of course, because colleges are (as usual) forced to teach using the bare minimum of resources.

I may be a dreamer, but I’m not the only one. As Voltaire famously said

Le superflu, chose très nécessaire

Here it is in English

The superfluous, a very necessary thing

though of course this is redundant.

]]>

“So”, I am often asked, “what is this all about?”

Short answer: Wadge reducibility is a simple way to compare the complexity of two properties (sets) of real numbers. If A and B are two such sets, A≤_{W}B (or simply A≤B) iff there is a continuous total function f such that for all reals a

a is a member of A iff f(a) is a member of B

In other words, f allows you to reduce the membership question for A to the membership question for B. Since continuous functions are considered to be especially simple, we conclude that A’s membership ‘problem’ is at most as difficult as B’s.

A more concise formalization is as follows: A≤B iff A = f^{-1}(B) for some continuous f.

A small technical point: by “reals” we actually mean infinite sequences of natural numbers – the *Baire space*. The Baire space is homeomorphic to (topologically the same) as the irrationals with the induced topology. It has some technical properties that make it easier to work with.

It is easy to describe the Baire topology informally. Two sequences are close if they have a common initial sequence; the longer this sequence, the closer they are. A set is open iff whenever a sequence is a member, so are all the sequences that are close enough to it. A function is continuous iff f(x) and f(y) are close whenever x and y are close enough.Another way of putting this is that a finite initial segment of f(x) is determined by a finite initial segment of x.

For example, let A be the set of all sequences in which 0 occurs at least once, and B the set of all sequences in which 0 occurs infinitely often. Then it can be shown (not obvious from the definition given above) that A is reducible to B but B is not reducible to A.

The definition is simple enough but you need one tool to make it usable: infinite games.

Suppose there are two sets A and B and two individuals, Believer and Doubter. Suppose that Believer thinks that A≤B and wants to prove it to Doubter, who thinks they are not reducible.Believer has to come up with a continuous function that performs the reduction. It follows easily from the description of continuity that we can think of f as a black box that transforms initial segments of a into initial segments of b. This black box is therefore a continuously operating machine that incrementally produces b0, b1, b2, … one by one from a0, a1, a2, … one by one (not necessarily at the same rate). In data flow terminology, it’s a *filter*.

Doubter will try and refute the filter by choosing a0, a1, a2, … so that a∈A⇔b∈B fails, and Believer hopes that his black box will produce a b that makes it succeed. To get the game, we let Believer replace the black box and play the b’s directly in response to the a’s (Believer is allowed to pass, but not indefinitely). Then (maybe not obvious) A≤B iff Believer has a winning strategy for the game. (This is the *Wadge game*; since it depends on A and B, it is written G_{W}(A,B), or simply G(A,B).)

For example, for the particular A and B described earlier, Believer’s strategy in G(A,B) is simply to copy Doubter’s moves until (if ever) Doubter plays a 0. At that point Believer too plays a 0, and continues playing 0’s for the rest of the game.

Believer doesn’t have a winning strategy for G(B,A) but it can be hard to prove *non*existence. We don’t have to – to prove there does *not* exist a winning strategy for Believer, we simply come up with a winning strategy for Doubter. It’s not hard to find one: in G(B,A), Doubter plays 0’s until (if ever) Believer plays a 0, at which point Doubter stops playing 0’s and plays 1’s instead.

In general, it might seem like, whatever A and B are, either Believer or Doubter has a winning strategy (since there are no ties). If this is the case, the game is said to be *determined*. Unfortunately, using the axiom of choice you can cook up games that are not determined. This raises complex foundational issues.

Fortunately, if the sets involved are not too complex, for example if they are Borel sets, then G(A,B) is determined. And the determinacy of G(A,B) has a simple but important consequence.

We’ve seen that a winning strategy for Believer in G(A,B) proves that A≤B. What about a winning strategy for Doubter? Some elementary calculations show that a winning strategy for Doubter can be converted to a winning strategy for Believer in the dual game G(B,-A). This in turn proves B≤-A.

In other words, if G(A,B) is determined, then A≤B or B≤-A. (This is *Wadge’s lemma*.)Thus if we restrict ourselves to Borel sets A and B, so that G(A,B) is determined, ≤ is *almost* a linear order.

Pretty simple stuff! Of course there’s a lot more to tell, but not in this post. I’ll leave you with a challenge. Let A be the set of all sequences in which some number occurs infinitely often, and B the set of all sequences in which all but a finite number of numbers eventually appear. How does ≤ relate them?

]]>