# Component Relationships

## MIM

Language with Muddle's control flow, object model, and syntax, but
much more static semantics, straightforwardly compilable to MIM
ops. Restrictions compared to Muddle:
- no EVAL, BIND, or QUOTEd args
- local variables must be resolvable lexically (explicit LVAL
  operations available for creating SPECIAL variables)
- the first position of a form is bound to a function at compile time
- FUNCTIONs can't be rebound from the perspective of MIM code (MUM can
  override anything for other MUM code, but the implementation of
  builtins written in MIM won't use the new versions)
Most of these restrictions would have been in place in Muddle 105's
MIMOC--at the time it was common practice for dynamic languages to
have different semantics when compiled (full Muddle semantics are
impossible to properly AOT compile; see *The Theory of Fexprs is
Trivial*, M. Wand, '98).

## MIMC

Library defining MIM as a collection of SUBRs that construct
bytecode when invoked, so that a program written in MIM can be
compiled by running it, and arbitrary code can be used in a macro-like
manner. E.g. after importing MIMC, this call will push a single MIM
operation into the current code buffer, looking up the buffer and the
local slots in the current dynamic scope:
    <cons x xs = xxs>
(This is actually a great example of the usefulness of dynamic scoping
in enabling DSLs when context is important.) More complex MIM builtins
(e.g. FCN, MAPF) will emit multiple opcodes, insert branches, build
MSUBR headers, etc.

Generally, libraries written in MIM (incl. most of the stdlib) will be
compiled AOT to reduce load time. We'll probably want to compile
per-file and link separately (faster turnaround time when working on
large libraries; enforces modularity).

## MUMC

MUMC can be a method JIT implemented as an extension of MIMC. This
eliminates the work of managing the correspondence between MUM's
environment and control flow and MIM's. This mainly requires
implementing some helper functions for complex MUM builtins, and
supporting alternate semantics for some operations (e.g. when
compiling in MUM mode locals are SPECIAL by default [just like
changing Muddle 105's SPECIAL-MODE flag], and so new scopes will emit
the LBIND-managing ops).

As mentioned above, full Muddle can't be AOT-compiled; but it's
straightforward to method-JIT it to MIM *speculatively*, when the
source is still available as a fallback if "dynamic behavior" is
observed. This in itself doesn't have any benefit over only
source-interpreting with an efficient format for structure constants,
(packed into a vector in depth-first order). What it gets us is the
power to perform some important optimizations relatively easily.
- inline short MIM builtins (many operations are simpler to perform
  directly than to APPLY)
- inline FEXPRs with constant folding and DCE
We can eliminate the guards on the hotpath and pay almost no runtime
cost to have the dynamic paths available (that is, when the user isn't
actually redefining core builtins, UNASSIGNing the locals of their
ancestors' scopes, and modifying the code of their callsites), if we
set hooks for GVAL modifications to functions that are called by
functions that have a JIT resident.

Also, once we have a basic working system and want to make it decently
fast, for such a dynamic language the only way to go is with a tracing
JIT for the hot loops--but just tracing an interpreter accomplishes
nothing (see *Tracing The Meta-Level*, C. Bolz, '09). With the MUM
method-jitted to MIM ops, a simple MIM bytecode JIT will also be a
tracing JIT for MUM.

Example of what compiling some MUM-calls to MIM ops might look like
(and where guards would be needed), along with structure constants
(for fallback to EVAL) for the call
`<A <B1> <B2 <C>>>`:
	consts:
		call = [A B1 FORM:1 B2 C FORM:1 FORM:2 FORM:3] ;structured objects stored in depth-first vector
	entry:
		; [GVAL modification hook (A): enable bypass to slowpath-a]
		<APPLYGLI B1 0 = PUSH> ; <APPLY G/LVAL IMM local:atom args:byte =>
		; [GVAL modification hook (B2): rewrite B1's return to slowpath-b2]
		<APPLYGLI C 0 = PUSH>
		<APPLYGLI B2 1 = PUSH>
	b2-done:
		<APPLYGLI A 2 = ret>
	a-done:
		<RET ret>
	slowpath-a:
		<LOADRPN A.. = call>
		<EVAL call = ret>
		<JUMP a-done>
	slowpath-b2:
		<LOADRPN B2.. = b2-call>
		<EVAL b2-call = PUSH>
		<JUMP b2-done>

# The Bytecode

## Format

The MIM design is a 3-operand register machine. This is an excellent
instruction format for enabling the implementation of an efficient
interpreter, though it has a higher minimum complexity than other
common layouts.

## Encoding

The docs we have are mum about instruction encoding, but given this
set of opcodes we're going to want something like this:

bit31                        bit0
v                               v
.-------------------------------.
|   D   |   C   |   B   |  op   |
'-------------------------------'

bit31                        bit0
v                               v
.-------------------------------.
|      CD       |   B   |   op  |
'-------------------------------'

32-bit fixed-width operations with 1-byte fields are easy to decode.
Storing the op at the numerically low end makes it easiest to read
first. Operand B can be decoded with the opcode before dispatching,
which enables simplifications like a fallthrough implementation for
ops that have one variant that takes an immediate in the B position
and another variant (with a different opcode) that takes a local.

The address modes are:
- B: immediate Byte (incl. signed offset for most jump operations)
- H: immediate Halfword (16-bit operand, incl long jumps JUMP/EMP?/FALSE?)
- W: IP-relative Word (32 bit constant inlined in code)
- L: index into the current function's Locals table
- LL: index into the Locals table, result indirected through a locative
The RVECTOR is placed at local0, so that static values can be accessed
via NTHR and locals with static names via NTHRL.

An opcode interprets each of its arguments according to a fixed
address mode; any necessary variants are encoded as separate opcodes
(distinguished in assembly by the suffixes listed above) rather than
branching on flag bits. The only common non-guard branches are the
opcode dispatch, (occasionally prefixed-opcode dispatch), and dispatch
for polymorphic/mistyped ops.
Some ops use use base/length pairs to blit ranges of locals/constants.

Some ~1/3 of the operations in the MIM docs are both rare (unlikely to
occur in hot loops) and short (don't need all 3 operand bytes), so we
can encode them all under 1 prefix opcode and sub-dispatch on operand
B, freeing a lot of opcode space for more critical instructions.

## Type System

The type system is mostly the same between the level of MIM ops and
MUM, with some caveats. MIM mostly looks at PRIMTYPEs and doesn't care
about the high bits, while MIM-level PRIMTYPEs have a couple of bits
that are masked out from MUM's view, to support the implementation of
weak pointers (STACK objects) and large allocations (see below).

STRING will need to have a different PRIMTYPE from BYTES, because it
would be unsafe to allow CHTYPE-ing arbitrary BYTES to utf-8 STRINGs
(and would expose the implementation detail of our string encoding;
eventually we might want to switch to Python 3's uniform-1/2/4 format,
trading some type system complexity for more speed).

To be compatible with MUM's polymorphic builtins, operations like
<REST> are polymorphic in bytecode. Operations that need to be fast
(like the primary looping operator, used for MAPF) also have
type-specialized versions for the major types. At opcode level, typing
is weak: when a typed opcode is applied to the wrong kind of object,
it redispatches to the polymorphic implementation; an ERROR only
occurs if no version of the operation is applicable to the
operand. This allows the implementation to support inline caching:
when a polymorphic loop operator is invoked, after dispatching on the
collection's type it overwrites the instruction pointer with the
type-specialized version of its opcode. Thus only the first iteration
of any loop requires polymorphic dispatch.

## Object Format

We can fit them in 64 bits!

A typical object layout, common to UBLOCK/BYTES/STRING:
bit63
v
.-------------------------------.
|            cur-ptr            |
+-------------------------------+
|     length    |  Typ  | Prim  |
'-------------------------------'
                                ^
							 bit0
The address is 32-bit. Length is 16-bit, with an interpretation that
scales by object size. REST and EMPTY? don't need to tell the 3 object
types apart (less dispatching), but NTH needs to know the item
length. A "normal" allocation can be used for lengths up to 2**16
objects.

Large object layout for UBLOCK/BYTES/STRING:
.-------------------------------.
|            cur-pos            |
+-------------------------------+
|   end-align   |  Typ  | Prim  |
'-------------------------------'
Cur-pos is a pointer, potentially offset from a static
large-allocations base and scaled by object size. End-align indicates
the number of trailing zero bits in the allocation's past-the-end
address.

This large object model comes out of two observations:
- Language semantics require that typical operations on collections be
  polymorphic anyway, so there's little cost to potentially
  dispatching to one more type of alternate format (especially since
  we'll never have to dispatch in loops).
- For objects larger than 64KB we probably want to mmap the allocation
  outside the normal heap in any case, so at that point we can find a
  highly-aligned location and use the alignment to only need to store
  the log2 of the length.

## Additional/Modified Opcodes

- <FALSE? local:any tgt:2byte>
  - This seems essential, not sure why there's nothing like it in the
    docs.
- LENL: added a max-value
  - LENGTH can be implemented in terms of LENGTH? with an optional
    max, but not the other way around
- <ERROR local:atom local:any local:any>
  - Seems necessary.
- <DESTRUC(L/U/B/R/.../x)? target:byte =>
  - Single-op implementation of idiomatic loops (MAPF/MAPR). Has the
    semantics of REST-NTH_1-EMPTY?-JUMP, but with much less work.
- implementing RETURN:
  - CALLs (and RESUMEs) are decoded twice: the 1st time does a call
    (ignoring the parameter specifying the result destination); the
    2nd reads only the result byte (of the instruction that is
    *before* the return address, e.g. last executed before call) so it
    knows what local to set with its result. (Result parameters are
    always in the D position, so RETURN doesn't need to know *what*
    it's returning to.)

# GC interface

The bulk of the GC (marker/sweeper, compactor) can be written in MIM,
and run in an internal PROCESS, though some fundamental ops need to be
compiled for a particular type of GC implementation (because hooks for
allocation and read/write barriers need to be as fast as possible).

# Loading MIM Libraries

Usually a whole library (like the stdlib) is statically linked into
one file, but per-file dynamic linking is also possible.

Run some code from the file, typically a standard loader in the
header--the loading process is up to the MIM code.

Standard loader needs to, at minimum:
- lookup some ATOMs from const STRINGs
- SETG those ATOMs (to LAZY.UBLOCK stub objects or upfront-loaded RVECTORs)

On a MSUBR's first call (or earlier), it's necessary to:
- create its RVECTOR, including its CODE object

File will look like:
[ module loader CODE ]                (not needed after load)
[ STRINGs for ATOMs defined on load ] (not needed after load)
[ RVECTOR loaders... ]                (const, each needed once)
[ MSUBR CODE... ]                     (rw, persistent)

--if we use the CODE in-place (as mmap'd), the GC can ignore it completely
RVECTOR loaders:
- all at once?
  - much more efficient if we're going to be running them all anyway
  - section becomes droppable
  - simpler
- lazily?
  - work scales with functions called, not functions available
  - have to mmap in low-space to make the CODE directly accessible
- it's up to the loader; initial impl will be upfront

MSUBR calls in a MSUBR can be:
- unlinked: ATOM in RVECTOR, NTHRGL to look up current value each call
- linked: MSUBR in RVECTOR, set up by loader

# Licensing

You can redistribute and/or modify this file under the terms of the
GNU Affero General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any
later version.

This file is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public
License along with this file. If not, see
<http://www.gnu.org/licenses/>.