I was asked, what am I doing with Janus?
It's the intermediate code generated by my Algol 68 compiler. There's a SNOBOL program that doesn't quite translate it into 360 assembler (I say "doesn't quite" because I'm sure there are still bugs to get out. The compiler only correctly ran a part of the Algol 68 test suite, and I would be very surprised if none of the deficiencies were in the Janus translator.)
So I have two approaches
-- translate Janus to machine code (probably a new translator). or
-- bypass Janus and translate the Algol 68 parse tree to machine code (what I was doing originally before compilation limits in the compiler I was using to compile Algol 68 H forced me to break things up and use and intermediate code.)
In combination with that choice, there's a choice of a local code generator -- there are a few now that I could use, I'm thinking of LLVM and C--, but there are probably more. There I have another choice -- to use one or the other, or neither and, say, generate assembler myself.
The questions are: which is likely to be more work, and which is likely to generate better code.
Since Janus can be compiled on a line-by-line basis with very little context (after all, it can be done by the STAGE2 macro processor), it is feasible to compile straight to assembler, the usual process for compilers on Unix. But debugging assembler is *hard*. (thought: it might not be so bad now that the world has usable debuggers). LLVM provides a fair amount of (optional) syntactic and semantic checking on the intermediate code, which I suspect will deal with most of the idiot-level bugs (and mst bugs are at that level.)
I could: parse JANUS and do pattern-matching on the parse tree to recognize stuff and generate code accordingly. I could do that parsing and matching easily enough in C, or in Scheme.
I perverted one thing in Janus -- the nesting of procedures. Janus doesn't do it. The code I generate does. The reason is that when I reach a procedure body in the normal tree-walk of code generation, the internal data structure in the compiler is just right for processing the nested procedure. To move the generated code out of the enclosing procedure would have been awkward, given the poor facilities available at the time for managing large text buffers. 400K memories were rare, and hard to get even if available; putting them in temporary files to be reprocessed later would have been awkward too, given OS/360's style of file access. But this is no problem now. Gigabyte memories are becoming the norm. What a difference a few decades has made! So it's perfectly feasible to implement text buffers in C, and let Algol W generate code into them. I've already got most of the API for that, except that it writes it all into a file instead. If I can manage to do something with the new varying-length strings Glyn has put into his Algol W compiler, it'll become even cleaner (the existing way I handle strings to be written to object code is to enclose the actual text in another inner layer of quotes to avoid the fixed-length-string restrictions of Algol W). If I use Janus as intermediate code, even if I choose to translate it into LLVM, I won't have a problem with interfacing Algol W with C++, which might or might not go smoothly.
Instead of assembler, I might be able to access a kind of low-level code generator I threw together a few years ago to generate code directly into memory for immediate execution. Leave that for later, if ever -- gdb doesn't understand this kind of code.
Does LLVM even have the kind of data structures Pascal and Janus use, with variants and such? If it does, are the ways of initializing them well-defined? Or is it just a matter of hoping future changes wont break what happens to have been implemented?