WAFER Architecture Reference (updated 2026-04-13) =================================================== 1. COMPILATION PIPELINE ----------------------- Forth Source | v Outer Interpreter (outer.rs) +--------------------------------------------+ | Tokenizer: whitespace-delimited words | | For each token: | | 1. Dictionary lookup (find) | | 2. If found + interpret mode: EXECUTE | | 3. If found + compile mode: | | - Immediate? Execute now | | - Normal? Append Call(WordId) to IR | | 4. Not found: try parse as number | | - Interpret: push to data stack | | - Compile: append PushI32(n) to IR | | 5. Neither: error "unknown word" | +--------------------------------------------+ | On `;` (end of colon definition): v Optimizer (optimizer.rs) +--------------------------------------------+ | Phase 1: Simplify | | Peephole -> Constant Fold -> | | Strength Reduce -> Peephole | | Phase 2: Inline then re-simplify | | Inline(max=8) -> Peephole -> | | Constant Fold -> Strength Reduce -> | | Peephole | | Phase 3: Eliminate dead code | | DCE -> Peephole | | Phase 4: Tail calls (must be last) | | Tail Call Detect | +--------------------------------------------+ | v Codegen (codegen.rs) +--------------------------------------------+ | IR -> WASM bytecode via wasm-encoder | | Each word = one WASM module with: | | Imports: emit, memory, dsp, rsp, fsp, | | table | | Types: void () -> (), i32 (i32) -> () | | One defined function (the word body) | | DSP cached in local 0, writeback before | | calls, reload after calls | | Scratch locals start at index 1 | +--------------------------------------------+ | v Runtime trait (runtime.rs) +--------------------------------------------+ | ForthVM — generic over backend | | Runtime provides: | | - Memory r/w (mem_read_i32, etc.) | | - Globals (get/set_dsp, rsp, fsp) | | - Table (ensure_table_size) | | - instantiate_and_install(wasm_bytes) | | - call_func(fn_index) | | - register_host_func(fn_index, HostFn) | | | | HostAccess trait — memory/global ops for | | host function callbacks | | HostFn = Box | +--------------------------------------------+ | | v v NativeRuntime WebRuntime (runtime_native.rs) (crates/web/runtime_web.rs) +------------------+ +------------------+ | wasmtime Engine | | js_sys::WebAsm | | Store, Memory | | Memory, Table | | Table, Globals | | Global objects | | Func closures | | JS Closures | +------------------+ +------------------+ 2. MEMORY LAYOUT (Linear Memory) -------------------------------- Address Region Size Notes -------- ------------------ ------- ------------------------- 0x0000 System Variables 64 B STATE, BASE, >IN, HERE, LATEST, SOURCE-ID, #TIB, HLD, LEAVE-FLAG 0x0040 Input Buffer 1024 B Source parsing 0x0440 PAD 256 B Scratch area 0x0540 Pictured Output 128 B <# ... #> (grows down) 0x05C0 WORD Buffer 64 B Transient counted string 0x0600 Data Stack 4096 B 1024 cells, grows DOWN 0x1600 (Data Stack Top) DSP starts here 0x1540 Return Stack 4096 B Grows DOWN 0x2540 Float Stack 2048 B 256 doubles, grows DOWN 0x2D40 Dictionary grows UP Linked list of word entries Total initial memory: 16 pages = 1 MiB (max 256 pages = 16 MiB) Cell size: 4 bytes (i32) Float size: 8 bytes (f64) 3. SYSTEM VARIABLES (offsets from 0x0000) ----------------------------------------- Offset Name Purpose ------ ---------- ----------------------------------- 0 STATE 0=interpreting, -1=compiling 4 BASE Number base (default 10) 8 >IN Parse offset into input buffer 12 HERE Next free dictionary address 16 LATEST Most recent dictionary entry addr 20 SOURCE-ID 0=user input, -1=string 24 #TIB Length of current input 28 HLD Pictured numeric output pointer 32 LEAVE-FLAG Nonzero when LEAVE called in loop 4. DICTIONARY ENTRY FORMAT -------------------------- +--------+-------+----------+---------+-----------+ | Link | Flags | Name | Padding | Code | | 4 bytes| 1 byte| N bytes | 0-3 B | 4 bytes | +--------+-------+----------+---------+-----------+ ^ ^ entry_addr code field (fn table index) Flags byte: Bit 7 (0x80): IMMEDIATE Bit 6 (0x40): HIDDEN (during compilation) Bits 0-4 (0x1F): name length (max 31) Link points to previous entry (0 = end of list). Name stored uppercase, padded to 4-byte alignment. Code field: index into WASM function table. Parameter field (if any) follows immediately after code field. 5. THREE TYPES OF WORDS ----------------------- a) IR Primitives (compiled to WASM) register_primitive("DUP", false, vec![IrOp::Dup]) - Body stored as Vec - Optimized, then compiled to WASM module - Inlineable by optimizer - FAST: no function call overhead when inlined b) Host Functions (HostFn closures) register_host_primitive(".", false, func) - HostFn = Box Result<()>> - Access memory/globals via HostAccess trait (runtime-agnostic) - NOT inlineable - Used for: I/O, dictionary manipulation, complex logic - Same closure works on NativeRuntime and WebRuntime c) Forth-defined words : SQUARE DUP * ; - Compiled by outer interpreter - Goes through full optimize -> codegen pipeline - Stored in ir_bodies for future inlining 6. WASM MODULE STRUCTURE (per word) ----------------------------------- Imports (6) — provided by Runtime impl: 0. emit (func: i32 -> void) Character output callback 1. memory (memory: 16 pages) Shared linear memory 2. dsp (global: mut i32) Data stack pointer 3. rsp (global: mut i32) Return stack pointer 4. fsp (global: mut i32) Float stack pointer 5. table (table: funcref) Shared function table Types (2): 0. void: () -> () 1. i32: (i32) -> () Functions (1): The compiled word body Element section: table[base_fn_index] = function 1 Runtime::instantiate_and_install(wasm_bytes, fn_index): - NativeRuntime: Module::new + Instance::new with 6 wasmtime imports - WebRuntime: WebAssembly.instantiate with JS import objects 7. OPTIMIZATION PASSES (detail) ------------------------------- PEEPHOLE (runs 5x across full pipeline): PushI32(n), Drop -> (removed) Unused literal Dup, Drop -> (removed) Redundant copy Swap, Swap -> (removed) Self-inverse Swap, Drop -> Nip Combine PushI32(0), Add -> (removed) Identity PushI32(0), Or -> (removed) Identity PushI32(-1), And -> (removed) Identity PushI32(1), Mul -> (removed) Identity Over, Over -> TwoDup Combine Drop, Drop -> TwoDrop Combine (+ float variants: PushF64/FDrop, FDup/FDrop, FSwap/FSwap, FNegate/FNegate) CONSTANT FOLD: Binary: PushI32(a), PushI32(b), -> PushI32(result) Supports: Add, Sub, Mul, And, Or, Xor, Lshift, Rshift, ArithRshift, Eq, NotEq, Lt, Gt, LtUnsigned Unary: PushI32(n), -> PushI32(result) Supports: Negate, Abs, Invert, ZeroEq, ZeroLt Float binary: PushF64(a), PushF64(b), -> PushF64(result) Float unary: PushF64(n), -> PushF64(result) STRENGTH REDUCE: PushI32(2^n), Mul -> PushI32(n), Lshift PushI32(0), Eq -> ZeroEq PushI32(0), Lt -> ZeroLt DCE: PushI32(nonzero), If{then,else} -> then_body only PushI32(0), If{then,else} -> else_body only Everything after Exit -> removed INLINE (max_size=8, single pass): Call(id) -> inline body if: - Body length <= 8 ops - No self-recursion - No Exit (would return from caller) - No ForthLocalGet/Set (would collide with caller's locals) TailCall -> Call when inlined (no longer tail position) TAIL CALL (last pass): Last Call(id) -> TailCall(id) if: - Return stack balanced (equal ToR and FromR) Recurses into If branches for conditional tail calls 8. CONSOLIDATION ---------------- CONSOLIDATE word recompiles all JIT-compiled words into a single WASM module: - All call_indirect -> direct call (for words in module) - External calls (host functions) remain call_indirect - Maximum performance for final program Two-part implementation: codegen::compile_consolidated_module() - builds multi-function module outer::ForthVM::consolidate() - orchestrates collection + table update 9. EXPORT PIPELINE (wafer build) -------------------------------- 1. Evaluate source file with recording_toplevel=true 2. Collect all IR words + top-level IR 3. Determine entry: --entry flag > MAIN word > top-level execution 4. Build consolidated module with data section (memory snapshot) 5. Embed metadata in "wafer" custom section (JSON) 6. Optional: --js generates JS loader + HTML page 7. Optional: --native AOT-compiles and appends to wafer binary Format: [wafer binary][precompiled WASM][metadata][trailer] Trailer: payload_len(8) + metadata_len(8) + "WAFEREXE"(8) 10. CRATE STRUCTURE ------------------- crates/ core/ wafer-core: compiler, optimizer, codegen, dictionary, Runtime trait Feature flags: default=["native"], "native" enables wasmtime Without features: pure Rust (dictionary, IR, optimizer, codegen, outer) cli/ wafer: CLI REPL (rustyline), wafer build/run commands web/ wafer-web: browser REPL (wasm-bindgen + WebRuntime + HTML/CSS/JS) Key web files: crates/web/src/lib.rs WaferRepl wasm-bindgen entry point crates/web/src/runtime_web.rs WebRuntime: js_sys WebAssembly API crates/web/www/app.js Frontend JS (terminal emulation) crates/web/www/index.html HTML shell crates/web/www/style.css Styling 11. BOOT SEQUENCE ----------------- ForthVM::::new() -> 1. R::new() — create runtime (wasmtime or browser WASM) 2. register_primitives() in batch_mode: - ~40 IR primitives (DUP, +, @, etc.) - ~60 host functions (., .S, M*, ACCEPT, etc.) - ~30 special words (IF, DO, :, VARIABLE, etc.) 3. compile_batch() - single WASM module for all IR primitives 4. Load boot.fth - Forth replaces Rust host functions: Phase 1: Stack/memory (DEPTH, PICK, 2OVER, FILL, MOVE) Phase 2: Double-cell arithmetic (D+, DNEGATE, D<) Phase 3: Mixed arithmetic (SM/REM, FM/MOD, */, */MOD) Phase 4: HERE, ALLOT, comma, ALIGN Phase 5: I/O, pictured numeric output (., U., TYPE, <# # #>) Phase 6: DEFER support Phase 7: String operations (COMPARE, SOURCE, FALIGNED)