A while back John Plaice and I invented an external module system for C . It worked pretty well for us but never caught on. Maybe it will be of some use to some of you.
Sloth was part of an insanely ambitious (for 2 or 3 people) project called the popshop (I mentioned it in my software success post).
It all started harmlessly enough – why don’t I write a Lucid interpreter?
One counter argument: I already had one, as I already mentioned in previous posts. It was mainly the work of A. Faustini, my first student and postdoc. It ran fine, had all the features used in the 1985 Lucid book, as well as excellent error handling and an escape to UNIX, What more could you want?
But I had bigger ambitions. After the Lucid book appeared, it dawned on me that the same principles could be applied to similar projects, for example, a three dimensional spreadsheet.
To the future and beyond
Faustini’s interpreter would not be a help. It was (wisely, as it turned out) a single purpose program, and unwieldy to boot. Faustini had practiced what I call programming for today – producing something that is immediately useful if not relevant to tomorrow’s projects.
I vowed I wouldn’t make the same – in my opinion – mistake. I would program for the future and produce general purpose reusable code that could be used for future intensional programming projects; including some that I hadn’t even thought of yet.
I had no idea how hard that would be. I ended up having to implement whole tools like Sloth that required as much work and ingenuity as any of the future projects I had in mind. I slid down the slippery slope of overengineering.
But I just couldn’t bring myself to write for the present, and produce a single purpose successor to the Faustini interpreter. Not a good career move as it turned out.
Modules up the yinyang
I started planning the mother of all projects and it immediately became clear that software could be divided up into a number of independent components.
One unifying theme is that all the future ‘deliverables’ (like the spreadsheet) would be based, like plucid (the language Faustini implemented), on the data types and syntax of POP-2. (This is where the name “popshop” came from.)
POP-2 is long dead but incorporated some clever ideas. It had numbers, strings, words (atoms) and nested lists, basically the same as LISP. But it used user-friendly infix syntax, as did pLucid and (I planned) the other deliverables.
So obviously we’d need a component (let’s call them modules) that implements the POP-2 data types. And a separate component for inputting and outputting these data types. And modules for lexical and syntactical analysis of POP-2 infix expressions.
The plan was to compile expressions into an intermediate language that runs on an abstract machine. So we need modules for the compiler and the abstract machine interpreter.
Then modules for eduction (demand-driven evaluation), for the warehouse (cache), for the warehouse storage management, for … and on and on.
Sloth to the rescue
At a certain point we realized we’d be dealing with dozens of modules. That may not sound like much but it is when there are only three of you (me, John Plaice, and student Martin Davis). Every one should be separately compilable, and have a .o file to be kept up to date. The makefile(s) alone would be huge.
So we decided to automate the process and Sloth was the result. It eliminated makefiles, allowed separate compilation, and kept .o files up to date. It generated and executed the complex c compiler commands. It worked very well for us.
The basic idea is to have a module be a directory (with a .m extension) and inside have the various files (e.g. procedure definitions or startup code) that define the module. Sloth would use these files to generate its own files and store them in the .m directory. In particular it would produce and maintain an up-to-date .o file.
Then to ‘compile’ a module there was the ‘link module’ command lkm. If foo was a module then lkm foo would update foo’s .o files and those of all the modules foo depended on directly or indirectly, launch a big compile, and produce an executable named simply foo.
The user supplied files
Normally the biggest and most important file is the proc.i file. It contained procedure definitions and other and other declarations (e.g. type declarations) not needed outside the module.
Modules also have a var.i file of variable declarations needed outside the module itself, by other modules. Similarly, there is a define.i file for macros, typedefs and other declarations needed externally.
Every module has a file body.i of initialization code to be executed once, at powerup.
Most modules require other, in a sense more primitive, modules to be available. Every module therefore has a file import which lists names of those modules (without the .m extension). A module cannot import itself directly or indirectly.
Users don’t normally enter the .m directory and manipulate these files directly. Instead they use simple commands: for example, vm d foo displays the define.i file in directory foo.m and mm p foo invokes the editor to modify the proc.i file in foo.m.
The lkm command
Sloth provides a single command for configuring an application and producing an executable. This is the lkm (link module) command. Invoking lkm foo launches elaborate behind the scenes computations and creates a number of files inside foo.m.
However this is transparent to the users. After a pause, lkm terminates and leaves an executable file foo in the directory in which it was invoked.
If out of curiosity you would like to know how Sloth works, I refer you to the paper linked at the beginning. In the meantime I’ll present a brief if incomplete description.
The first step is to produce a list of all the modules imported directly or indirectly by foo; all the modules required to produce the executable foo. In other words this is the transitive closure of the import relation rooted at foo. This list is stored in the Sloth-generated uselist file.
The order in which files appear in the uselist is crucial. Briefly, it is most primitive first. More precisely, if module X imports module Y directly, then Y must appear (somewhere) before module X. Sloth produces uselist by performing a topological sort on the import relation. Sloth computes the uselist of every file foo imports directly or indirectly, not just the uselist of foo itself.
The next step is to arrange that the .o file of each module on foo‘s uselist is present and up to date (I’ll skip some of the details). Once up to date .o files are available, Sloth launches a compile that will produce the executable foo. It creates a short .c file whose procedure main consists of calls to the powerup routines (the body.i files) of foo‘s uselist, most primitive first. The last to be called is the body.i of foo itself.
Note that this does not necessarily involve a large number of compiles. It could be that most of the modules on the uselist already have up to date .o files because they have been on the uselist of another module bar for which lkm bar has already been invoked. With Sloth applications share compiled code as well as source.
Early on we encountered situations where it seemed we had no choice but to allow modules to import each other – an unacceptable situation. In these situations it seemed unavoidable that module A call procedures in module B, and that module B call procedures in module A.
Consider a module sto that allocated storage from a free list. When the free list is exhausted, it does a mark-and-sweep garbage collection – the usual arrangement.
If a module sta needs dynamic storage it obviously needs to import sto and call a procedure in sto that returns a new cell. But on the other hand, when the free list is exhausted it needs to call a procedure in sta that marks all the storage that sta isn’t ready to relinquish. A circularity.
The solution is to make the callbacks to sta anonymous, so that sto does not have to import sta. We provide sto with what we call a jobjar, a set of function pointers, plus a procedure addjob(p) which adds function pointer p to the job jar. In the powerup code of sta we call addjob(p) with p a pointer to a function that marks the storage sta wants to keep.
Then when the freelist runs out sto first marks everything it knows about then runs through the job jar calling all the procedures in it. This marks all the storage required by all the modules importing sto (assuming they all called addjob on initialization as required).
Similar situations arose several times. For example, suppose module mch implements the abstract machine. The machines works by associating procedures with POP-2 words, in a table pairing words with function pointers. If module, say, mat implements matrix commands, mat imports mch. But the machine has to know about the matrix commands and their names.
The solution is to provide mch with a procedure newcommand(w,p) that takes a word w and a function pointer p and adds w to the command table associated with p. The powerup code in mat adds all the mat commands to the instruction table in mch.
Sloth was originally developed to configure Pascal (!) code until we realized that there was no way to implement extensible modules, since Pascal has no equivalent of function pointers.
There is a version of Sloth, called Lemur, that handles versioning but I’ll describe it later.
Sloth is available if any of you out in blogland think is could be useful. It’s not currently used or maintained but we have the source code in standard C (and no, we weren’t dumb enough to structure it as Sloth modules). Sloth modules are not as convenient as, say, Python modules. Python modules have most of the same features (e.g. modules imported in correct order) without the inconvenience of a third party application. It wouldn’t make sense to develop Sloth-for-Python.
On the other hand, believe it or not, C is still very widely used, and if you’re using it, you’ll want to modularize. Perhaps Sloth can help you make decades old technology a few decades more up to date.