doc/DESIGN.md

   1 # Object format and target platform
   2
   3 The semantics of the language are closely tied to the layout of its
   4 objects; it is critical for efficient interpretation that the new
   5 layout be similar to the original design.
   6
   7 Key features of the original format:
   8 * `WORD` size: 36-bit
   9 * object size: 2 words
  10 * `WORD`:pointer size ratio: 2:1
  11
  12 ## `WORD` size
  13
  14 Shrinking `WORD`s to 32 bits would pose more backward compatibility
  15 hazards than extending them to 64 bits. `WORD`, and derived types like
  16 `FIX`, must support 64-bit operations.
  17
  18 ## Pointer size
  19
  20 On modern platforms, we have two available pointer sizes: 32-bit or
  21 64-bit. With 64-bit pointers all objects would be 4-words and
  22 256-bits, even though most of that would be dead space for most
  23 `TYPE`s. The Muddle object layout is only reasonable with 32-bit
  24 pointers.
  25
  26 ## 64-bit `WORD`s, 32-bit pointers
  27
  28 So in order to maintain compatibility with objects designed for a 2:1
  29 pointer size ratio, our target platform is constrained to look like
  30 the x32 ABI: an ILP32 environment with access to instructions that
  31 operate on 64-bit words.
  32
  33 It is not necessary to compile for the actual x32 ABI, as support for
  34 x32 executables is not widespread. The implementation can be compiled
  35 for x86-64, but internally ensure that its memory for storing objects
  36 is mapped in the low 32-bits of its address space and then cast 64-bit
  37 pointers to 32-bit values for storage in objects.
  38
  39 ## Implications for target platforms:
  40
  41 ### x86-64 (primary target)
  42
  43 Muddle processes will be restricted to 4GB of address space per
  44 process. That "should be enough for anybody," right?
  45
  46 Nonstandard pointers add a wrinkle to FFI, but only a superficial one:
  47 Muddle objects need to be GC-managed, so they can't be externally
  48 allocated anyway; and FFI-pointers belong in `WORD`s, not object
  49 pointers.
  50
  51 If using an off-the-shelf GC like BDW, it does complicate things: the
  52 library would need modification to recognize the packed pointers.
  53
  54 ### x86-32 (possible port)
  55
  56 It wouldn't be hard to port the interpreter to 32-bit systems, but if
  57 `FIX`es are allowed to use 32-bit arithmetic that could break old code
  58 that assumes they are at least 36 bits, and any new code that assumed
  59 they were 64 bits. In the future, we should consider switching to
  60 explicitly-sized `FIX32`/`FIX64` or the like; for now let's just make
  61 `FIX` 64-bit.
  62
  63 # Interpretation
  64
  65 The semantics documented in M-PL effectively mandate that the
  66 interpreter act like an AST-interpreter:
  67 * any `ATOM` could be rebound at any time to something that takes its
  68   arguments as a `CALL` and edits, as syntax, the code of its caller
  69   -- so a non-AST interpreter would still have to keep the AST
  70   available, and would have to detect whether a `CALL` modified its
  71   callee (e.g. by setting a `WRITE` monitor for all AST values)
  72 * debugging mechanisms like `FRAME` allow direct inspection and even
  73   modification of the call stack, which requires non-internal
  74   subroutine calls to use the MCALL calling convention -- so most of
  75   the optimization a JIT or bytecode interpreter could perform is
  76   rendered moot
  77
  78 The central implementation decision is how to overcome or avoid the
  79 difficulties inherent in implementing an interpreter with more
  80 advanced control flow than its host language. In this case:
  81 coroutines, re-entrant stack-allocated continuations, and non-local
  82 return are not straightforward in C-like languages. Options include:
  83 * stackless interpreter
  84 * assembly-language interpreter that can follow the program's complex
  85   control flow
  86 * implement just the compiler and use a metacircular evaluator to get
  87   an interpreter
  88
  89 The stackless approach goes well with an AST interpreter, and from the
  90 wording in M-PL's Appendix I think it was the original
  91 implementation's approach (although it's also possible they matched
  92 the program's control flow, since that wouldn't have been hard for a
  93 program written in PDP assembly).
  94
  95 ## Stackless interpreter
  96
  97 Implement the interpreter core in plain C. Decouple the interpreter's
  98 control flow from the interpreted program:
  99 * Allocate the CONTROL STACKs on the heap. The interpreter core
 100   subroutines use slots in their `FRAME` for local variables that may
 101   need to be persisted across a `RESUME`.
 102 * Track program execution explicitly as a state machine.
 103
 104 Advantages:
 105 * portability and readability of C
 106
 107 Disadvantages:
 108 * using state machines in place of direct control flow and eliminating
 109   local variables makes for tricky C (cf. Boost's ASIO)