From: Kaz Wesley Date: Sat, 17 Mar 2018 00:50:24 +0000 (-0700) Subject: MIM bytecode design document X-Git-Url: https://jxself.org/git/?p=muddle.git;a=commitdiff_plain;h=86b8bbc88ebc3f63ca7a4adf4e26c9120a78d15d;hp=da54afbd677beaaf96bb019858502d77b182ddd4 MIM bytecode design document Signed-off-by: Kaz Wesley --- diff --git a/doc/mim.md b/doc/mim.md new file mode 100644 index 0000000..b2af24d --- /dev/null +++ b/doc/mim.md @@ -0,0 +1,246 @@ +# Component Relationships + +## MIM + +Language with Muddle's control flow, object model, and syntax, but +much more static semantics, straightforwardly compilable to MIM +ops. Restrictions compared to Muddle: +- no EVAL, BIND, or QUOTEd args +- local variables must be resolvable lexically (explicit LVAL + operations available for creating SPECIAL variables) +- the first position of a form is bound to a function at compile time +- FUNCTIONs can't be rebound from the perspective of MIM code (MUM can + override anything for other MUM code, but the implementation of + builtins written in MIM won't use the new versions) +Most of these restrictions would have been in place in Muddle 105's +MIMOC--at the time it was common practice for dynamic languages to +have different semantics when compiled (full Muddle semantics are +impossible to properly AOT compile; see *The Theory of Fexprs is +Trivial*, M. Wand, '98). + +## MIMC + +Library defining MIM as a collection of SUBRs that construct +bytecode when invoked, so that a program written in MIM can be +compiled by running it, and arbitrary code can be used in a macro-like +manner. E.g. after importing MIMC, this call will push a single MIM +operation into the current code buffer, looking up the buffer and the +local slots in the current dynamic scope: + +(This is actually a great example of the usefulness of dynamic scoping +in enabling DSLs when context is important.) More complex MIM builtins +(e.g. FCN, MAPF) will emit multiple opcodes, insert branches, build +MSUBR headers, etc. + +Generally, libraries written in MIM (incl. most of the stdlib) will be +compiled AOT to reduce load time. We'll probably want to compile +per-file and link separately (faster turnaround time when working on +large libraries; enforces modularity). + +## MUMC + +MUMC can be a method JIT implemented as an extension of MIMC. This +eliminates the work of managing the correspondence between MUM's +environment and control flow and MIM's. This mainly requires +implementing some helper functions for complex MUM builtins, and +supporting alternate semantics for some operations (e.g. when +compiling in MUM mode locals are SPECIAL by default [just like +changing Muddle 105's SPECIAL-MODE flag], and so new scopes will emit +the LBIND-managing ops). + +As mentioned above, full Muddle can't be AOT-compiled; but it's +straightforward to method-JIT it to MIM *speculatively*, when the +source is still available as a fallback if "dynamic behavior" is +observed. This in itself doesn't have any benefit over only +source-interpreting with an efficient format for structure constants, +(packed into a vector in depth-first order). What it gets us is the +power to perform some important optimizations relatively easily. +- inline short MIM builtins (many operations are simpler to perform + directly than to APPLY) +- inline FEXPRs with constant folding and DCE +We can eliminate the guards on the hotpath and pay almost no runtime +cost to have the dynamic paths available (that is, when the user isn't +actually redefining core builtins, UNASSIGNing the locals of their +ancestors' scopes, and modifying the code of their callsites), if we +set hooks for GVAL modifications to functions that are called by +functions that have a JIT resident. + +Also, once we have a basic working system and want to make it decently +fast, for such a dynamic language the only way to go is with a tracing +JIT for the hot loops--but just tracing an interpreter accomplishes +nothing (see *Tracing The Meta-Level*, C. Bolz, '09). With the MUM +method-jitted to MIM ops, a simple MIM bytecode JIT will also be a +tracing JIT for MUM. + +Example of what compiling some MUM-calls to MIM ops might look like +(and where guards would be needed), along with structure constants +(for fallback to EVAL) for the call +` >>`: + consts: + call = [A B1 FORM:1 B2 C FORM:1 FORM:2 FORM:3] ;structured objects stored in depth-first vector + entry: + ; [GVAL modification hook (A): enable bypass to slowpath-a] + ; + ; [GVAL modification hook (B2): rewrite B1's return to slowpath-b2] + + + b2-done: + + a-done: + + slowpath-a: + + + + slowpath-b2: + + + + +# The Bytecode + +## Format + +The MIM design is a 3-operand register machine. This is an excellent +instruction format for enabling the implementation of an efficient +interpreter, though it has a higher minimum complexity than other +common layouts. + +## Encoding + +The docs we have are mum about instruction encoding, but given this +set of opcodes we're going to want something like this: + +bit31 bit0 +v v +.-------------------------------. +| D | C | B | op | +'-------------------------------' + +bit31 bit0 +v v +.-------------------------------. +| CD | B | op | +'-------------------------------' + +32-bit fixed-width operations with 1-byte fields are easy to decode. +Storing the op at the numerically low end makes it easiest to read +first. Operand B can be decoded with the opcode before dispatching, +which enables simplifications like a fallthrough implementation for +ops that have one variant that takes an immediate in the B position +and another variant (with a different opcode) that takes a local. + +Operand types can be: +- immediate Byte (incl. signed offset for must jump operations) +- immediate Word (offset for the "long jumps"--JUMP/EMP?/FALSE?) +- index into the current function's Constants table +- index into the current function's Locals table +- index into the Locals table containing an LVAR to be dereferenced +Generally each opcode has a fixed interpretation of each operand; any +necessary variants are encoded as separate opcodes rather than +branching on flag bits. The only common non-guard branches are the +opcode dispatch, (occasionally prefixed-opcode dispatch), and dispatch +for polymorphic/mistyped ops. +Some ops use use base/length pairs to blit ranges of locals/constants. + +Some ~1/3 of the operations in the MIM docs are both rare (unlikely to +occur in hot loops) and short (don't need all 3 operand bytes), so we +can encode them all under 1 prefix opcode and sub-dispatch on operand +B, freeing a lot of opcode space for more critical instructions. + +## Type System + +The type system is mostly the same between the level of MIM ops and +MUM, with some caveats. MIM mostly looks at PRIMTYPEs and doesn't care +about the high bits, while MIM-level PRIMTYPEs have a couple of bits +that are masked out from MUM's view, to support the implementation of +weak pointers (STACK objects) and large allocations (see below). + +STRING will need to have a different PRIMTYPE from BYTES, because it +would be unsafe to allow CHTYPE-ing arbitrary BYTES to utf-8 STRINGs +(and would expose the implementation detail of our string encoding; +eventually we might want to switch to Python 3's uniform-1/2/4 format, +trading some type system complexity for more speed). + +To be compatible with MUM's polymorphic builtins, operations like + are polymorphic in bytecode. Operations that need to be fast +(like the primary looping operator, used for MAPF) also have +type-specialized versions for the major types. At opcode level, typing +is weak: when a typed opcode is applied to the wrong kind of object, +it redispatches to the polymorphic implementation; an ERROR only +occurs if no version of the operation is applicable to the +operand. This allows the implementation to support inline caching: +when a polymorphic loop operator is invoked, after dispatching on the +collection's type it overwrites the instruction pointer with the +type-specialized version of its opcode. Thus only the first iteration +of any loop requires polymorphic dispatch. + +## Object Format + +We can fit them in 64 bits! + +A typical object layout, common to UBLOCK/BYTES/STRING: +bit63 +v +.-------------------------------. +| end-ptr | ++-------------------------------+ +| length | Typ | Prim | +'-------------------------------' + ^ + bit0 +The address is 32-bit. Length is 16-bit, with an interpretation that +scales by object size. REST and EMPTY? don't need to tell the 3 object +types apart (less dispatching), but NTH needs to know the item +length. A "normal" allocation can be used for lengths up to 2**16 +objects. + +Large object layout for UBLOCK/BYTES/STRING: +.-------------------------------. +| cur-pos | ++-------------------------------+ +| end-align | Typ | Prim | +'-------------------------------' +Cur-pos is a pointer, potentially offset from a static +large-allocations base and scaled by object size. End-align indicates +the number of trailing zero bits in the allocation's past-the-end +address. + +This large object model comes out of two observations: +- Language semantics require that typical operations on collections be + polymorphic anyway, so there's little cost to potentially + dispatching to one more type of alternate format (especially since + we'll never have to dispatch in loops). +- For objects larger than 64KB we probably want to mmap the allocation + outside the normal heap in any case, so at that point we can find a + highly-aligned location and use the alignment to only need to store + the log2 of the length. + +## Additional/Modified Opcodes + +- + - This seems essential, not sure why there's nothing like it in the + docs. +- LENL: added a max-value + - LENGTH can be implemented in terms of LENGTH? with an optional + max, but not the other way around +- + - Seems necessary. +- + - Single-op implementation of idiomatic loops (MAPF/MAPR). Has the + semantics of REST-NTH_1-EMPTY?-JUMP, but with much less work. +- implementing RETURN: + - CALLs (and RESUMEs) are decoded twice: the 1st time does a call + (ignoring the parameter specifying the result destination); the + 2nd reads only the result byte (of the instruction that is + *before* the return address, e.g. last executed before call) so it + knows what local to set with its result. (Result parameters are + always in the D position, so RETURN doesn't need to know *what* + it's returning to.) + +# GC interface + +The bulk of the GC (marker/sweeper, compactor) can be written in MIM, +and run in an internal PROCESS, though some fundamental ops need to be +compiled for a particular type of GC implementation (because hooks for +allocation and read/write barriers need to be as fast as possible).