3b65b48640
tools/anki_gen.py: generates 389-card Anki deck (.apkg) from hand-crafted YAML + auto-parsed source (IrOp variants, memory constants, error types, peephole patterns, primitive registrations, boot.fth defs, Runtime trait). tools/anki_data.yaml: 71 hand-crafted cards covering architecture, design decisions, ForthVM internals, codegen, optimizer, boot.fth, control flow, Runtime trait, and testing infrastructure. tools/ir_quiz.py: interactive terminal quiz (41 exercises) — predict optimized IR for Forth code (constant fold, peephole, strength reduce, DCE, tail call, inlining). tools/reading_order.md: guided 23-step codebase reading sequence. tools/trace_exercises.md: 20 trace-the-compilation exercises with answers. tools/architecture.txt: single-page ASCII system reference.
307 lines
12 KiB
Plaintext
307 lines
12 KiB
Plaintext
WAFER Architecture Reference (updated 2026-04-13)
|
|
===================================================
|
|
|
|
1. COMPILATION PIPELINE
|
|
-----------------------
|
|
|
|
Forth Source
|
|
|
|
|
v
|
|
Outer Interpreter (outer.rs)
|
|
+--------------------------------------------+
|
|
| Tokenizer: whitespace-delimited words |
|
|
| For each token: |
|
|
| 1. Dictionary lookup (find) |
|
|
| 2. If found + interpret mode: EXECUTE |
|
|
| 3. If found + compile mode: |
|
|
| - Immediate? Execute now |
|
|
| - Normal? Append Call(WordId) to IR |
|
|
| 4. Not found: try parse as number |
|
|
| - Interpret: push to data stack |
|
|
| - Compile: append PushI32(n) to IR |
|
|
| 5. Neither: error "unknown word" |
|
|
+--------------------------------------------+
|
|
| On `;` (end of colon definition):
|
|
v
|
|
Optimizer (optimizer.rs)
|
|
+--------------------------------------------+
|
|
| Phase 1: Simplify |
|
|
| Peephole -> Constant Fold -> |
|
|
| Strength Reduce -> Peephole |
|
|
| Phase 2: Inline then re-simplify |
|
|
| Inline(max=8) -> Peephole -> |
|
|
| Constant Fold -> Strength Reduce -> |
|
|
| Peephole |
|
|
| Phase 3: Eliminate dead code |
|
|
| DCE -> Peephole |
|
|
| Phase 4: Tail calls (must be last) |
|
|
| Tail Call Detect |
|
|
+--------------------------------------------+
|
|
|
|
|
v
|
|
Codegen (codegen.rs)
|
|
+--------------------------------------------+
|
|
| IR -> WASM bytecode via wasm-encoder |
|
|
| Each word = one WASM module with: |
|
|
| Imports: emit, memory, dsp, rsp, fsp, |
|
|
| table |
|
|
| Types: void () -> (), i32 (i32) -> () |
|
|
| One defined function (the word body) |
|
|
| DSP cached in local 0, writeback before |
|
|
| calls, reload after calls |
|
|
| Scratch locals start at index 1 |
|
|
+--------------------------------------------+
|
|
|
|
|
v
|
|
Runtime trait (runtime.rs)
|
|
+--------------------------------------------+
|
|
| ForthVM<R: Runtime> — generic over backend |
|
|
| Runtime provides: |
|
|
| - Memory r/w (mem_read_i32, etc.) |
|
|
| - Globals (get/set_dsp, rsp, fsp) |
|
|
| - Table (ensure_table_size) |
|
|
| - instantiate_and_install(wasm_bytes) |
|
|
| - call_func(fn_index) |
|
|
| - register_host_func(fn_index, HostFn) |
|
|
| |
|
|
| HostAccess trait — memory/global ops for |
|
|
| host function callbacks |
|
|
| HostFn = Box<dyn Fn(&mut dyn HostAccess)> |
|
|
+--------------------------------------------+
|
|
| |
|
|
v v
|
|
NativeRuntime WebRuntime
|
|
(runtime_native.rs) (crates/web/runtime_web.rs)
|
|
+------------------+ +------------------+
|
|
| wasmtime Engine | | js_sys::WebAsm |
|
|
| Store, Memory | | Memory, Table |
|
|
| Table, Globals | | Global objects |
|
|
| Func closures | | JS Closures |
|
|
+------------------+ +------------------+
|
|
|
|
|
|
2. MEMORY LAYOUT (Linear Memory)
|
|
--------------------------------
|
|
|
|
Address Region Size Notes
|
|
-------- ------------------ ------- -------------------------
|
|
0x0000 System Variables 64 B STATE, BASE, >IN, HERE,
|
|
LATEST, SOURCE-ID, #TIB,
|
|
HLD, LEAVE-FLAG
|
|
0x0040 Input Buffer 1024 B Source parsing
|
|
0x0440 PAD 256 B Scratch area
|
|
0x0540 Pictured Output 128 B <# ... #> (grows down)
|
|
0x05C0 WORD Buffer 64 B Transient counted string
|
|
0x0600 Data Stack 4096 B 1024 cells, grows DOWN
|
|
0x1600 (Data Stack Top) DSP starts here
|
|
0x1540 Return Stack 4096 B Grows DOWN
|
|
0x2540 Float Stack 2048 B 256 doubles, grows DOWN
|
|
0x2D40 Dictionary grows UP Linked list of word entries
|
|
|
|
Total initial memory: 16 pages = 1 MiB (max 256 pages = 16 MiB)
|
|
Cell size: 4 bytes (i32)
|
|
Float size: 8 bytes (f64)
|
|
|
|
|
|
3. SYSTEM VARIABLES (offsets from 0x0000)
|
|
-----------------------------------------
|
|
|
|
Offset Name Purpose
|
|
------ ---------- -----------------------------------
|
|
0 STATE 0=interpreting, -1=compiling
|
|
4 BASE Number base (default 10)
|
|
8 >IN Parse offset into input buffer
|
|
12 HERE Next free dictionary address
|
|
16 LATEST Most recent dictionary entry addr
|
|
20 SOURCE-ID 0=user input, -1=string
|
|
24 #TIB Length of current input
|
|
28 HLD Pictured numeric output pointer
|
|
32 LEAVE-FLAG Nonzero when LEAVE called in loop
|
|
|
|
|
|
4. DICTIONARY ENTRY FORMAT
|
|
--------------------------
|
|
|
|
+--------+-------+----------+---------+-----------+
|
|
| Link | Flags | Name | Padding | Code |
|
|
| 4 bytes| 1 byte| N bytes | 0-3 B | 4 bytes |
|
|
+--------+-------+----------+---------+-----------+
|
|
^ ^
|
|
entry_addr code field (fn table index)
|
|
|
|
Flags byte:
|
|
Bit 7 (0x80): IMMEDIATE
|
|
Bit 6 (0x40): HIDDEN (during compilation)
|
|
Bits 0-4 (0x1F): name length (max 31)
|
|
|
|
Link points to previous entry (0 = end of list).
|
|
Name stored uppercase, padded to 4-byte alignment.
|
|
Code field: index into WASM function table.
|
|
Parameter field (if any) follows immediately after code field.
|
|
|
|
|
|
5. THREE TYPES OF WORDS
|
|
-----------------------
|
|
|
|
a) IR Primitives (compiled to WASM)
|
|
register_primitive("DUP", false, vec![IrOp::Dup])
|
|
- Body stored as Vec<IrOp>
|
|
- Optimized, then compiled to WASM module
|
|
- Inlineable by optimizer
|
|
- FAST: no function call overhead when inlined
|
|
|
|
b) Host Functions (HostFn closures)
|
|
register_host_primitive(".", false, func)
|
|
- HostFn = Box<dyn Fn(&mut dyn HostAccess) -> Result<()>>
|
|
- Access memory/globals via HostAccess trait (runtime-agnostic)
|
|
- NOT inlineable
|
|
- Used for: I/O, dictionary manipulation, complex logic
|
|
- Same closure works on NativeRuntime and WebRuntime
|
|
|
|
c) Forth-defined words
|
|
: SQUARE DUP * ;
|
|
- Compiled by outer interpreter
|
|
- Goes through full optimize -> codegen pipeline
|
|
- Stored in ir_bodies for future inlining
|
|
|
|
|
|
6. WASM MODULE STRUCTURE (per word)
|
|
-----------------------------------
|
|
|
|
Imports (6) — provided by Runtime impl:
|
|
0. emit (func: i32 -> void) Character output callback
|
|
1. memory (memory: 16 pages) Shared linear memory
|
|
2. dsp (global: mut i32) Data stack pointer
|
|
3. rsp (global: mut i32) Return stack pointer
|
|
4. fsp (global: mut i32) Float stack pointer
|
|
5. table (table: funcref) Shared function table
|
|
|
|
Types (2):
|
|
0. void: () -> ()
|
|
1. i32: (i32) -> ()
|
|
|
|
Functions (1):
|
|
The compiled word body
|
|
|
|
Element section:
|
|
table[base_fn_index] = function 1
|
|
|
|
Runtime::instantiate_and_install(wasm_bytes, fn_index):
|
|
- NativeRuntime: Module::new + Instance::new with 6 wasmtime imports
|
|
- WebRuntime: WebAssembly.instantiate with JS import objects
|
|
|
|
|
|
7. OPTIMIZATION PASSES (detail)
|
|
-------------------------------
|
|
|
|
PEEPHOLE (runs 5x across full pipeline):
|
|
PushI32(n), Drop -> (removed) Unused literal
|
|
Dup, Drop -> (removed) Redundant copy
|
|
Swap, Swap -> (removed) Self-inverse
|
|
Swap, Drop -> Nip Combine
|
|
PushI32(0), Add -> (removed) Identity
|
|
PushI32(0), Or -> (removed) Identity
|
|
PushI32(-1), And -> (removed) Identity
|
|
PushI32(1), Mul -> (removed) Identity
|
|
Over, Over -> TwoDup Combine
|
|
Drop, Drop -> TwoDrop Combine
|
|
(+ float variants: PushF64/FDrop, FDup/FDrop, FSwap/FSwap, FNegate/FNegate)
|
|
|
|
CONSTANT FOLD:
|
|
Binary: PushI32(a), PushI32(b), <op> -> PushI32(result)
|
|
Supports: Add, Sub, Mul, And, Or, Xor, Lshift, Rshift, ArithRshift,
|
|
Eq, NotEq, Lt, Gt, LtUnsigned
|
|
Unary: PushI32(n), <op> -> PushI32(result)
|
|
Supports: Negate, Abs, Invert, ZeroEq, ZeroLt
|
|
Float binary: PushF64(a), PushF64(b), <op> -> PushF64(result)
|
|
Float unary: PushF64(n), <op> -> PushF64(result)
|
|
|
|
STRENGTH REDUCE:
|
|
PushI32(2^n), Mul -> PushI32(n), Lshift
|
|
PushI32(0), Eq -> ZeroEq
|
|
PushI32(0), Lt -> ZeroLt
|
|
|
|
DCE:
|
|
PushI32(nonzero), If{then,else} -> then_body only
|
|
PushI32(0), If{then,else} -> else_body only
|
|
Everything after Exit -> removed
|
|
|
|
INLINE (max_size=8, single pass):
|
|
Call(id) -> inline body if:
|
|
- Body length <= 8 ops
|
|
- No self-recursion
|
|
- No Exit (would return from caller)
|
|
- No ForthLocalGet/Set (would collide with caller's locals)
|
|
TailCall -> Call when inlined (no longer tail position)
|
|
|
|
TAIL CALL (last pass):
|
|
Last Call(id) -> TailCall(id) if:
|
|
- Return stack balanced (equal ToR and FromR)
|
|
Recurses into If branches for conditional tail calls
|
|
|
|
|
|
8. CONSOLIDATION
|
|
----------------
|
|
|
|
CONSOLIDATE word recompiles all JIT-compiled words into a
|
|
single WASM module:
|
|
- All call_indirect -> direct call (for words in module)
|
|
- External calls (host functions) remain call_indirect
|
|
- Maximum performance for final program
|
|
|
|
Two-part implementation:
|
|
codegen::compile_consolidated_module() - builds multi-function module
|
|
outer::ForthVM::consolidate() - orchestrates collection + table update
|
|
|
|
|
|
9. EXPORT PIPELINE (wafer build)
|
|
--------------------------------
|
|
|
|
1. Evaluate source file with recording_toplevel=true
|
|
2. Collect all IR words + top-level IR
|
|
3. Determine entry: --entry flag > MAIN word > top-level execution
|
|
4. Build consolidated module with data section (memory snapshot)
|
|
5. Embed metadata in "wafer" custom section (JSON)
|
|
6. Optional: --js generates JS loader + HTML page
|
|
7. Optional: --native AOT-compiles and appends to wafer binary
|
|
Format: [wafer binary][precompiled WASM][metadata][trailer]
|
|
Trailer: payload_len(8) + metadata_len(8) + "WAFEREXE"(8)
|
|
|
|
|
|
10. CRATE STRUCTURE
|
|
-------------------
|
|
|
|
crates/
|
|
core/ wafer-core: compiler, optimizer, codegen, dictionary, Runtime trait
|
|
Feature flags: default=["native"], "native" enables wasmtime
|
|
Without features: pure Rust (dictionary, IR, optimizer, codegen, outer)
|
|
cli/ wafer: CLI REPL (rustyline), wafer build/run commands
|
|
web/ wafer-web: browser REPL (wasm-bindgen + WebRuntime + HTML/CSS/JS)
|
|
|
|
Key web files:
|
|
crates/web/src/lib.rs WaferRepl wasm-bindgen entry point
|
|
crates/web/src/runtime_web.rs WebRuntime: js_sys WebAssembly API
|
|
crates/web/www/app.js Frontend JS (terminal emulation)
|
|
crates/web/www/index.html HTML shell
|
|
crates/web/www/style.css Styling
|
|
|
|
|
|
11. BOOT SEQUENCE
|
|
-----------------
|
|
|
|
ForthVM::<R>::new() ->
|
|
1. R::new() — create runtime (wasmtime or browser WASM)
|
|
2. register_primitives() in batch_mode:
|
|
- ~40 IR primitives (DUP, +, @, etc.)
|
|
- ~60 host functions (., .S, M*, ACCEPT, etc.)
|
|
- ~30 special words (IF, DO, :, VARIABLE, etc.)
|
|
3. compile_batch() - single WASM module for all IR primitives
|
|
4. Load boot.fth - Forth replaces Rust host functions:
|
|
Phase 1: Stack/memory (DEPTH, PICK, 2OVER, FILL, MOVE)
|
|
Phase 2: Double-cell arithmetic (D+, DNEGATE, D<)
|
|
Phase 3: Mixed arithmetic (SM/REM, FM/MOD, */, */MOD)
|
|
Phase 4: HERE, ALLOT, comma, ALIGN
|
|
Phase 5: I/O, pictured numeric output (., U., TYPE, <# # #>)
|
|
Phase 6: DEFER support
|
|
Phase 7: String operations (COMPARE, SOURCE, FALIGNED)
|