A while back Weichang Du and I designed a spreadsheet based on intensional logic, the logic of values that vary over a coordinate space.
Spreadsheets are a natural fit for ‘intensifying’ because a sheet is already a two-dimensional intension, varying over the horizontal (A, B, C, …) and vertical (1, 2, 3, …) dimensions. But we can do better than just redo Excel with intensional semantics. Intensionality opens up some interesting possibilities; like user-defined operators, time varying sheets, or nested sheets.
This post is based on the IEEE Software article Du and I wrote, A Three Dimensional Spreadsheet, but with a different notation (to simplify the presentation).
The Fibonacci Numbers
Let’s start with a simple example – the Fibonacci numbers. Suppose we want to compute them and display them in row 1. In Excel it’s easy peasy, we put 0 in cell A1 and 1 in cell B1; then A1+B1 in cell C1, B1+C1 in cell D1, C1+D1 in cell E1, etc. The formulas in the cells are
A B C D E F ----------------------------------- 0 | 1 | A1+B1 | B1+C1 | C1+D1 | ...
and the values displayed are
A B C D E F ----------------------------------- 0 | 1 | 2 | 3 | 5 | ...
Notice that the formulas in the cells are all different; although if you know Excel, you know that they are all ‘copies’ of the formula in C2. Not literally copies, but functional copies. The value in cell E1 bears the same relationship to the two values on the left as C1 does to the two values on its left.
Confused? Never mind, it all works out more simply with the intensional spreadsheet.
The basic idea is that we define a sheet (which needs a name, say F) by putting expressions in cells. The difference is that they are in the language of two dimensional intensions. Let’s call the horizontal dimension h and the vertical dimension v. We need space versions of the traditional Lucid temporal operators first, next, and fby. (Plus prev).
Let’s call next.h right and prev.h left (for obvious reasons). Then we put the following formulas in the cells A1, B1, C1, …
A B C D E ------------------------------------------------------------------------------ 0 | 1 | left F + left left F | left F + left left F | left F + left left F | ...
The first thing to notice is that apart from the first two, the formulas are exactly the same. Copying in the intensional spreadsheet is just that.
Also evaluation is just that. To get (for example) the numerical value of the cell D1, we evaluate the formula stored in D1. This evaluation will entail evaluating the formulas in C1 and eventually in A1 and B1, the base cases of the recursion. Note that this is eduction (more below).
[It’s been pointed out that you can get something close to this with RC notation. R[-2]C is the cell two columns to the left on the same row. This notation is not as general; for example we can write left(F+left F).]
It should be clear that most spread sheets are easily formulated in the new notation. The only question is how to do what are known as ‘absolute references’, that are not altered by traditional copying. One approach is to use expressions like A2 as absolute coordinates; so that e.g. F.A2 is the contents of exactly the A2 cell of F (in the above, 1) and it gets copied as exactly F.A2.
As a special case, we have operators side and top that take us to the side and top of the sheet, respectively. So side F is F.A and top F is F.0.
The fact that all but the first two formulas in the Fibonacci sheet are the same suggests some shortcuts. The user interface could allow us to select a set of cells and put a single formula into each of them.
Another idea is to allow us to give a default formula for an entire row (or column), with the understanding that putting formulas in individual cells overrides the default.
As another example we could define a sheet P by putting 1 in every cell in row 1, 1 in every cell in column A, and the default formula
left P + up P
in all the other cells of P. The result is to make every column a running total of the column to the left. The result is Pascal’s triangle (tilted to the left)
--------------------- | 1 | 1 | 1 | 1 | 1 ... --------------------- | 1 | 2 | 3 | 4 | 5 --------------------- | 1 | 3 | 6 |10 |15 --------------------- | 1 | 4 |10 |20 |35 --------------------- | 1 | 5 |15 |35 |70 ---------------------
Our spreadsheet design allows for definitions that apply globally. For example, if we add the global definition
F2 = left F + left left F
then we can put F2 in cells C1, D1, E1 etc. We can have global user defined functions like
left2(X) = left left X
and define F2 as
left F + left2(F)
Functions can of course be defined recursively; for example
leftsum(X,i) = if i eq 0 then 0 else left X + leftsum(left X,i-1) fi
defines e.g. leftsum(F,10) to be the sum of the 10 values to the left. If we want to sum all the values to the left we can add the global definition
leftsumall(X) = leftsum(X,index.h)
where index.d is the coordinate in the d dimension.
Since we already have dimensions h and v it’s no technical problem to allow the traditional Lucid time dimension t. The result is a three-dimensional spreadsheet which could be understood as a time varying two dimensional spreadsheet. For example, a traditional spreadsheet might display sales (S) of car models by salesman. With the time dimension we can do this for every month, starting with (say) January as month 0.
Then we can have extra cells (along the side and the bottom) which display totals for model and salesman (for the current month). But with the time dimension we can also display percentage change over the last month. For example of D10 displays Bob’s total month sales, we can put the formula
100*(S – prev S)/(prev S)
in D11 which will now display the percentage change in Bob’s sales.
Another example of a time varying spreadsheet is that for the relaxation method. It is used to give approximate solutions to the Poisson equation ∇2 P = 0. It is an iterative process where each (interior) cell value is repeatedly replaced by the average of the values above and below and on either side; for example, the value of F8 is replace by the average of the values of F7, F9, E8 and G8.
We can program this by placing the formula
prev (left P + right P + up P + down P)/4
in all the interior cells.
The intensional approach allows another generalization: nested spreadsheets. By this we mean that any cell of the main sheet can (but not necessarily does) contain a whole sub-spreadsheet. And that any cell of the sub-spreadsheet might contain a sub-sub-spreadsheet, and so on.
Indexing makes this simple: any cell of the sub-spreadsheet is indexed by two letter-number combinations. For example, if F8 has a sub-sheet, its cells have coordinates like F8.C10 or F8.H7. In the same way a cell in a sub-sub-sheet might have an index F8.H7.K21.
Nested sheets would allow a very natural way of representing hierarchical data. The top row of the main sheet could hold sales per year and month (two dimensions) and each cell could hold data of sales per month/salesperson (two more dimensions).
We would need extra intensional operators which take us down one level or across to neighbouring sheets. We haven’t worked out the details.
This all sounds extremely ambitious and even impractical; and would be if we tried to implement it with conventional techniques. For example, if the user can open sub-sheets on the fly, how do we know how many cells to allocate?
It is all made possible by a simple technique for evaluation of intensional systems, namely demand driven (lazy) evaluation; also called eduction.
Briefly, it proceeds as follows: to determine that value that should go in a cell c, with coordinates θ, we first determine the defining formula φθ that goes in c. That’s not necessarily straight forward since we may have to use θ take defaults into account. Once we have determined φθ, we can cache it labelled by θ.
The next step is to evaluate φθ at θ, giving value vθ. This may require us to evaluate other φ’s at other θ’s, giving other v’s. When we’re done, we can cache vθ labelled by θ.
We repeat the procedure for all the cells we’re interested in. When we’re done, we’ve produced an intension Δ that has at each coordinate θ value vθ. Then Δ has the property that at each coordinate θ the value of the formula at φθ given the intension Δ is the value stored in the cell at coordinate θ. This is the denotational semantics of the spreadsheet.
With a conventional spreadsheet a formula is evaluated as soon as it is entered. We can do the same, but there is another option: to wait till all the formulas are entered (including by defaults) then evaluate. The demand driven algorithm determines the order of evaluation. Not that it matters – no matter what order you ask for values to be computed, you’ll end up with the same results (that’s one thing the semantics tells us).
Of course if an entry or default is changed then all the current values are potentially incorrect. The easiest way to deal with this is to recompute them all. However it’s possible to do dependency analysis and maybe reduce the scope of the required recalculation. In so doing we must not forget the caches, as some of their entries will also be incorrect.
You’ve probably noticed that the explanations are a bit vague. This is because the intensional spreadsheet is vaporware – it is not currently implemented (though Du produced a proof-of-concept implementation of an early version).
The main reason for the vapor status is that implementing a usable spreadsheet is an immense task. For a start it means implementing a programming language, the language of formulas. Plus a graphical user environment. And then add charting, formatting and the like. For a spreadsheet to be successful, it has to compete with Excel, which is huge.
The open office experience shows that it is possible, but only as the result of hundreds of contributors in an open source software project.
We would be happy for any of you to take up these ideas any way you want. The intensional spreadsheet is in many ways simpler than a conventional one, and at the same time has more power and potential. Let’s hope it becomes more than just promising vapor.