# WAFER Trace-the-Compilation Exercises For each exercise, manually trace the Forth code through the full pipeline: 1. **Outer interpreter** — tokenization, dictionary lookup, compile/interpret dispatch 2. **IR generation** — what Vec is produced 3. **Optimization** — which passes fire, what changes 4. **Codegen** — WASM instructions emitted (conceptual) 5. **Runtime** — how it executes Answers are below each exercise (scroll down or cover with paper). --- ## Exercise 1: Simple Arithmetic ```forth : SQUARE DUP * ; ```
Answer 1. `:` → enter compile mode, next token "SQUARE" = word name, dictionary.create("SQUARE") 2. `DUP` → find in dictionary → IR primitive (WordId N) → append `Call(dup_id)` 3. `*` → find → IR primitive → append `Call(mul_id)` 4. `;` → raw IR: `[Call(dup_id), Call(mul_id)]` 5. **Optimize:** - Inline: DUP body=[Dup] (1 op ≤ 8), * body=[Mul] (1 op ≤ 8) → `[Dup, Mul]` - Peephole: no patterns match Dup,Mul - Constant fold: nothing to fold - Tail call: Mul is not a Call → skip - **Final IR: `[Dup, Mul]`** 6. **Codegen:** - Dup: `local.get $dsp; i32.load; local.set $tmp; dsp_dec; local.get $dsp; local.get $tmp; i32.store` - Mul: `pop; pop; i32.mul; push_via_local` 7. **Runtime:** WASM module instantiated, function registered at table[word_id]
--- ## Exercise 2: Constant Folding ```forth : TEN 5 5 + ; ```
Answer 1. `:` → compile mode, name="TEN" 2. `5` → not in dictionary → parse as number → append `PushI32(5)` 3. `5` → append `PushI32(5)` 4. `+` → find → IR primitive → append `Call(add_id)` 5. `;` → raw IR: `[PushI32(5), PushI32(5), Call(add_id)]` 6. **Optimize:** - Inline: + body=[Add] → `[PushI32(5), PushI32(5), Add]` - Constant fold: PushI32(5), PushI32(5), Add → `PushI32(10)` - **Final IR: `[PushI32(10)]`** 7. **Codegen:** Just `push_const(f, 10)` → `dsp_dec; local.get $dsp; i32.const 10; i32.store`
--- ## Exercise 3: Peephole Elimination ```forth : NOOP DUP DROP ; ```
Answer 1. Raw IR after inlining: `[Dup, Drop]` 2. **Optimize:** - Peephole: Dup, Drop → removed (both eliminated) - **Final IR: `[]` (empty)** 3. **Codegen:** Empty function body — just DSP writeback at entry/exit
--- ## Exercise 4: Strength Reduction ```forth : DOUBLE 8 * ; ```
Answer 1. Raw IR after inlining: `[PushI32(8), Mul]` 2. **Optimize:** - Strength reduce: PushI32(8) is 2^3, so → `[PushI32(3), Lshift]` - 8 * x becomes x << 3 - **Final IR: `[PushI32(3), Lshift]`** 3. **Codegen:** push_const(3), then pop two, i32.shl, push result
--- ## Exercise 5: Tail Call Detection ```forth : FOO 1 + BAR ; ``` (Assume BAR is a previously defined word)
Answer 1. Raw IR: `[PushI32(1), Call(add_id), Call(bar_id)]` 2. **Optimize:** - Inline + (1 op): `[PushI32(1), Add, Call(bar_id)]` - Tail call: last op is Call(bar_id), return stack balanced (no >R or R>) → `TailCall(bar_id)` - **Final IR: `[PushI32(1), Add, TailCall(bar_id)]`** 3. **Codegen:** TailCall emits `dsp_writeback; call_indirect bar_id; return`
--- ## Exercise 6: Control Flow — IF/THEN ```forth : ABS DUP 0< IF NEGATE THEN ; ```
Answer 1. `DUP` → Call(dup_id), `0<` → Call(zerolt_id) 2. `IF` → push ControlEntry::If { then_body: [] }, start collecting 3. `NEGATE` → Call(negate_id) appended to then_body 4. `THEN` → pop ControlEntry::If, emit `If { then_body: [Call(negate_id)], else_body: None }` 5. Raw IR: `[Call(dup_id), Call(zerolt_id), If { then: [Call(negate_id)], else: None }]` 6. **Optimize:** - Inline all (each is 1 op): `[Dup, ZeroLt, If { then: [Negate], else: None }]` - Note: optimizer recurses into If bodies via apply_to_bodies - **Final IR: `[Dup, ZeroLt, If { then: [Negate], else: None }]`** 7. **Codegen:** pop flag → `if (block) ... end` WASM structure
--- ## Exercise 7: DO LOOP ```forth : STARS 0 DO 42 EMIT LOOP ; ```
Answer 1. `0` → PushI32(0) 2. `DO` → push ControlEntry::Do { body: [] } 3. `42` → PushI32(42) into body 4. `EMIT` → Call(emit_id) into body 5. `LOOP` → pop Do, emit `DoLoop { body: [PushI32(42), Call(emit_id)], is_plus_loop: false }` 6. Note: the 0 and the limit (already on stack from caller) are consumed by DoLoop 7. **Optimize:** - Inline EMIT (1 op): `DoLoop { body: [PushI32(42), Emit], is_plus_loop: false }` - **Final IR:** `[PushI32(0), DoLoop { body: [PushI32(42), Emit], is_plus_loop: false }]` 8. **Codegen:** Loop index+limit in WASM locals. WASM `loop { body; index++; br_if index --- ## Exercise 8: BEGIN UNTIL ```forth : COUNTDOWN BEGIN DUP . 1 - DUP 0= UNTIL DROP ; ```
Answer 1. `BEGIN` → push ControlEntry::Begin { body: [] } 2. `DUP .` → Call(dup_id), Call(dot_id) into body 3. `1 -` → PushI32(1), Call(sub_id) into body 4. `DUP 0=` → Call(dup_id), Call(zeroeq_id) into body 5. `UNTIL` → pop Begin, emit `BeginUntil { body: [Call(dup), Call(dot), PushI32(1), Call(sub), Call(dup), Call(zeroeq)] }` 6. **Optimize:** Inline small primitives. `1 -` stays as `PushI32(1), Sub` (no further fold since operand unknown). `.` is a host function → NOT inlined. 7. `DROP` after loop.
--- ## Exercise 9: Dead Code Elimination ```forth : ALWAYS-TRUE TRUE IF 42 ELSE 99 THEN ; ```
Answer 1. Raw IR after inlining TRUE (body=[PushI32(-1)]): `[PushI32(-1), If { then: [PushI32(42)], else: Some([PushI32(99)]) }]` 2. **DCE:** PushI32(-1) is nonzero → emit then_body only → `[PushI32(42)]` 3. Entire IF/ELSE/THEN eliminated. Just pushes 42.
--- ## Exercise 10: Swap Peephole Patterns ```forth : TEST SWAP SWAP DROP DROP ; ```
Answer 1. After inlining: `[Swap, Swap, Drop, Drop]` 2. **Peephole pass 1:** - Swap, Swap → removed → `[Drop, Drop]` - Drop, Drop → TwoDrop → `[TwoDrop]` 3. **Final IR: `[TwoDrop]`**
--- ## Exercise 11: Nested Control Flow ```forth : CLASSIFY DUP 0< IF DROP -1 ELSE 0> IF 1 ELSE 0 THEN THEN ; ```
Answer 1. IR structure (after inlining): ``` [Dup, ZeroLt, If { then: [Drop, PushI32(-1)], else: Some([Gt(implicit 0>), If { then: [PushI32(1)], else: Some([PushI32(0)]) }]) }] ``` 2. Optimizer recurses into both If bodies. No constant conditions → no DCE. 3. Codegen: nested WASM `if/else/end` blocks.
--- ## Exercise 12: DOES> Defining Word ```forth : CONSTANT CREATE , DOES> @ ; 5 CONSTANT FIVE FIVE . ```
Answer 1. `: CONSTANT` enters compile mode 2. `CREATE` — flagged as saw_create_in_def=true 3. `,` — compiled normally 4. `DOES>` — splits definition: - create_ir = everything before DOES> (the `,` call) - does_action = everything after DOES> (the `@` call) → compiled as separate word - Stores DoesDefinition { create_ir, does_action_id, has_create: true } 5. `5 CONSTANT FIVE`: - CONSTANT executes its defining behavior - CREATE makes dictionary entry "FIVE" - `,` stores 5 at FIVE's parameter field - DOES> patches FIVE to execute the does_action (which does `@`) 6. `FIVE .`: - FIVE executes: pushes its PFA, then calls does_action (`@`) - `@` fetches the 5 stored there - `.` prints "5 "
--- ## Exercise 13: Consolidation ```forth : A 1 ; : B 2 ; : C A B + ; CONSOLIDATE ```
Answer 1. Before CONSOLIDATE: A, B, C are separate WASM modules. C calls A and B via `call_indirect` through the function table. 2. CONSOLIDATE: - Collects all IR bodies: A=[PushI32(1)], B=[PushI32(2)], C=[Call(a_id), Call(b_id), Add(inlined)] - Builds local_fn_map: A→1, B→2, C→3 (within consolidated module) - `compile_consolidated_module()`: all three become functions in one WASM module - C's Call(a_id) → direct `call 1` (not call_indirect) - Replaces all table entries with new functions 3. Result: C calling A and B is now a direct WASM `call` — much faster than table dispatch.
--- ## Exercise 14: Host Function Execution ```forth 5 3 M* ```
Answer 1. `5` → push to data stack (dsp -= 4, mem[dsp] = 5) 2. `3` → push to data stack (dsp -= 4, mem[dsp] = 3) 3. `M*` → host function (Rust closure): - Read sp = dsp global value - Read n2 = mem[sp] = 3 (as i64) - Read n1 = mem[sp+4] = 5 (as i64) - result = 5i64 * 3i64 = 15i64 - lo = 15 as i32 = 15 - hi = (15 >> 32) as i32 = 0 - Write mem[sp+4] = 15 (lo), mem[sp] = 0 (hi) - Stack unchanged (still 2 cells, now containing double-cell 15) 4. Note: M* is a host function because it needs 64-bit multiplication (WASM i32 only)
--- ## Exercise 15: Float Operations ```forth : HYPOTENUSE FDUP F* FSWAP FDUP F* F+ FSQRT ; ```
Answer 1. After inlining: `[FDup, FMul, FSwap, FDup, FMul, FAdd, FSqrt]` 2. **Peephole:** No matching patterns (FDup+FMul not a known pair) 3. **Codegen:** All float ops use the float stack (FSP global): - FDup: `fpeek(f)` then `fpush_via_local` - FMul: `emit_float_binary` with `f64.mul` - FSqrt: `emit_float_unary` with `f64.sqrt` 4. Float stack lives at 0x2540-0x2D40 in linear memory
--- ## Exercise 16: BEGIN WHILE REPEAT ```forth : COUNTDOWN BEGIN DUP WHILE DUP . 1 - REPEAT DROP ; ```
Answer 1. `BEGIN` → ControlEntry::Begin { body: [] } 2. `DUP` → Call(dup_id) into body 3. `WHILE` → pop Begin, create ControlEntry::BeginWhile { test: [Call(dup_id)], body: [] } 4. `DUP . 1 -` → into body 5. `REPEAT` → pop BeginWhile, emit `BeginWhileRepeat { test: [Dup], body: [Dup, Call(dot_id), PushI32(1), Sub] }` 6. Semantics: evaluate test; if false exit loop; execute body; jump to BEGIN
--- ## Exercise 17: Batch Mode Compilation ```forth ( During ForthVM::new() ) ```
Answer 1. `register_primitives()` sets `batch_mode = true` 2. Each `register_primitive("DUP", ...)`: - Creates dictionary entry (dictionary.create + reveal) - Stores IR body in ir_bodies - Pushes `(word_id, ir_body)` to `deferred_ir` (no WASM compilation yet) 3. After all ~40 IR primitives registered: - `compile_batch()` compiles ALL deferred IR into a single WASM module - One `rt.instantiate_and_install()` call — single module with ~40 functions - Each function registered in the table 4. Why batch? Amortizes runtime compilation overhead. One module instead of 40. 5. Host functions bypass batch_mode — registered via `rt.register_host_func()` with HostFn closures.
--- ## Exercise 18: wafer build Pipeline ```forth ( file: hello.fth ) : MAIN ." Hello, World!" CR ; ``` ```bash wafer build hello.fth -o hello.wasm ```
Answer 1. `cmd_build()`: create ForthVM, set recording=true, evaluate source 2. `evaluate()`: compiles MAIN normally (IR → optimize → codegen) 3. `recording_toplevel=true`: but MAIN is a definition, not top-level execution, so toplevel_ir stays empty 4. `export_module()`: - Collect IR words: MAIN + all boot.fth definitions - Entry point: no --entry flag, look for MAIN → found! - Build `local_fn_map`: all words get module-internal indices - `compile_exportable_module()`: single WASM module with all functions - Data section: snapshot of linear memory (dictionary, variables, etc.) - Metadata in "wafer" custom section: version, entry index, host functions, memory size, stack pointers 5. Output: hello.wasm file
--- ## Exercise 19: Stack-to-Local Promotion ```forth : ADD3 + + ; ```
Answer 1. After inlining: `[Add, Add]` 2. **Stack-to-local promotion** (codegen pass, not optimizer): - Analyzes stack flow: first Add pops 2, pushes 1; second Add pops 2 (including that 1), pushes 1 - If stack depth is statically known at each point → can use WASM locals instead of memory stack - Result: operands stay in WASM locals/operand stack, no memory reads/writes - Much faster: avoids load/store through linear memory 3. Promotion only works for "straight-line" code (no calls that might modify the stack unpredictably)
--- ## Exercise 20: MARKER and State Restore ```forth MARKER CLEAN : FOO 1 ; : BAR 2 ; CLEAN FOO \ Error: unknown word ```
Answer 1. `MARKER CLEAN`: - Creates a MarkerState snapshot: dictionary state, user_here, next_table_index, word_pfa_map, ir_bodies, does_definitions, host_word_names, two_value_words, fvalue_words - Registers CLEAN as a word that, when executed, restores this snapshot 2. `: FOO 1 ; : BAR 2 ;` — normal compilation, adds to dictionary 3. `CLEAN`: - Executes the marker word - Restores dictionary to state before FOO/BAR were defined - Resets user_here, ir_bodies, etc. - FOO and BAR are gone — dictionary.find("FOO") returns None 4. `FOO` → "unknown word: FOO" Key: MARKER doesn't undo WASM table entries (they become unreachable but stay allocated). It restores the dictionary and Rust-side metadata.