3b65b48640
tools/anki_gen.py: generates 389-card Anki deck (.apkg) from hand-crafted YAML + auto-parsed source (IrOp variants, memory constants, error types, peephole patterns, primitive registrations, boot.fth defs, Runtime trait). tools/anki_data.yaml: 71 hand-crafted cards covering architecture, design decisions, ForthVM internals, codegen, optimizer, boot.fth, control flow, Runtime trait, and testing infrastructure. tools/ir_quiz.py: interactive terminal quiz (41 exercises) — predict optimized IR for Forth code (constant fold, peephole, strength reduce, DCE, tail call, inlining). tools/reading_order.md: guided 23-step codebase reading sequence. tools/trace_exercises.md: 20 trace-the-compilation exercises with answers. tools/architecture.txt: single-page ASCII system reference.
13 KiB
13 KiB
WAFER Trace-the-Compilation Exercises
For each exercise, manually trace the Forth code through the full pipeline:
- Outer interpreter — tokenization, dictionary lookup, compile/interpret dispatch
- IR generation — what Vec is produced
- Optimization — which passes fire, what changes
- Codegen — WASM instructions emitted (conceptual)
- Runtime — how it executes
Answers are below each exercise (scroll down or cover with paper).
Exercise 1: Simple Arithmetic
: SQUARE DUP * ;
Answer
:→ enter compile mode, next token "SQUARE" = word name, dictionary.create("SQUARE")DUP→ find in dictionary → IR primitive (WordId N) → appendCall(dup_id)*→ find → IR primitive → appendCall(mul_id);→ raw IR:[Call(dup_id), Call(mul_id)]- Optimize:
- Inline: DUP body=[Dup] (1 op ≤ 8), * body=[Mul] (1 op ≤ 8) →
[Dup, Mul] - Peephole: no patterns match Dup,Mul
- Constant fold: nothing to fold
- Tail call: Mul is not a Call → skip
- Final IR:
[Dup, Mul]
- Inline: DUP body=[Dup] (1 op ≤ 8), * body=[Mul] (1 op ≤ 8) →
- Codegen:
- Dup:
local.get $dsp; i32.load; local.set $tmp; dsp_dec; local.get $dsp; local.get $tmp; i32.store - Mul:
pop; pop; i32.mul; push_via_local
- Dup:
- Runtime: WASM module instantiated, function registered at table[word_id]
Exercise 2: Constant Folding
: TEN 5 5 + ;
Answer
:→ compile mode, name="TEN"5→ not in dictionary → parse as number → appendPushI32(5)5→ appendPushI32(5)+→ find → IR primitive → appendCall(add_id);→ raw IR:[PushI32(5), PushI32(5), Call(add_id)]- Optimize:
- Inline: + body=[Add] →
[PushI32(5), PushI32(5), Add] - Constant fold: PushI32(5), PushI32(5), Add →
PushI32(10) - Final IR:
[PushI32(10)]
- Inline: + body=[Add] →
- Codegen: Just
push_const(f, 10)→dsp_dec; local.get $dsp; i32.const 10; i32.store
Exercise 3: Peephole Elimination
: NOOP DUP DROP ;
Answer
- Raw IR after inlining:
[Dup, Drop] - Optimize:
- Peephole: Dup, Drop → removed (both eliminated)
- Final IR:
[](empty)
- Codegen: Empty function body — just DSP writeback at entry/exit
Exercise 4: Strength Reduction
: DOUBLE 8 * ;
Answer
- Raw IR after inlining:
[PushI32(8), Mul] - Optimize:
- Strength reduce: PushI32(8) is 2^3, so →
[PushI32(3), Lshift] - 8 * x becomes x << 3
- Final IR:
[PushI32(3), Lshift]
- Strength reduce: PushI32(8) is 2^3, so →
- Codegen: push_const(3), then pop two, i32.shl, push result
Exercise 5: Tail Call Detection
: FOO 1 + BAR ;
(Assume BAR is a previously defined word)
Answer
- Raw IR:
[PushI32(1), Call(add_id), Call(bar_id)] - Optimize:
- Inline + (1 op):
[PushI32(1), Add, Call(bar_id)] - Tail call: last op is Call(bar_id), return stack balanced (no >R or R>) →
TailCall(bar_id) - Final IR:
[PushI32(1), Add, TailCall(bar_id)]
- Inline + (1 op):
- Codegen: TailCall emits
dsp_writeback; call_indirect bar_id; return
Exercise 6: Control Flow — IF/THEN
: ABS DUP 0< IF NEGATE THEN ;
Answer
DUP→ Call(dup_id),0<→ Call(zerolt_id)IF→ push ControlEntry::If { then_body: [] }, start collectingNEGATE→ Call(negate_id) appended to then_bodyTHEN→ pop ControlEntry::If, emitIf { then_body: [Call(negate_id)], else_body: None }- Raw IR:
[Call(dup_id), Call(zerolt_id), If { then: [Call(negate_id)], else: None }] - Optimize:
- Inline all (each is 1 op):
[Dup, ZeroLt, If { then: [Negate], else: None }] - Note: optimizer recurses into If bodies via apply_to_bodies
- Final IR:
[Dup, ZeroLt, If { then: [Negate], else: None }]
- Inline all (each is 1 op):
- Codegen: pop flag →
if (block) ... endWASM structure
Exercise 7: DO LOOP
: STARS 0 DO 42 EMIT LOOP ;
Answer
0→ PushI32(0)DO→ push ControlEntry::Do { body: [] }42→ PushI32(42) into bodyEMIT→ Call(emit_id) into bodyLOOP→ pop Do, emitDoLoop { body: [PushI32(42), Call(emit_id)], is_plus_loop: false }- Note: the 0 and the limit (already on stack from caller) are consumed by DoLoop
- Optimize:
- Inline EMIT (1 op):
DoLoop { body: [PushI32(42), Emit], is_plus_loop: false } - Final IR:
[PushI32(0), DoLoop { body: [PushI32(42), Emit], is_plus_loop: false }]
- Inline EMIT (1 op):
- Codegen: Loop index+limit in WASM locals. WASM
loop { body; index++; br_if index<limit }
Exercise 8: BEGIN UNTIL
: COUNTDOWN BEGIN DUP . 1 - DUP 0= UNTIL DROP ;
Answer
BEGIN→ push ControlEntry::Begin { body: [] }DUP .→ Call(dup_id), Call(dot_id) into body1 -→ PushI32(1), Call(sub_id) into bodyDUP 0=→ Call(dup_id), Call(zeroeq_id) into bodyUNTIL→ pop Begin, emitBeginUntil { body: [Call(dup), Call(dot), PushI32(1), Call(sub), Call(dup), Call(zeroeq)] }- Optimize: Inline small primitives.
1 -stays asPushI32(1), Sub(no further fold since operand unknown)..is a host function → NOT inlined. DROPafter loop.
Exercise 9: Dead Code Elimination
: ALWAYS-TRUE TRUE IF 42 ELSE 99 THEN ;
Answer
- Raw IR after inlining TRUE (body=[PushI32(-1)]):
[PushI32(-1), If { then: [PushI32(42)], else: Some([PushI32(99)]) }] - DCE: PushI32(-1) is nonzero → emit then_body only
→
[PushI32(42)] - Entire IF/ELSE/THEN eliminated. Just pushes 42.
Exercise 10: Swap Peephole Patterns
: TEST SWAP SWAP DROP DROP ;
Answer
- After inlining:
[Swap, Swap, Drop, Drop] - Peephole pass 1:
- Swap, Swap → removed →
[Drop, Drop] - Drop, Drop → TwoDrop →
[TwoDrop]
- Swap, Swap → removed →
- Final IR:
[TwoDrop]
Exercise 11: Nested Control Flow
: CLASSIFY DUP 0< IF DROP -1 ELSE 0> IF 1 ELSE 0 THEN THEN ;
Answer
- IR structure (after inlining):
[Dup, ZeroLt, If {
then: [Drop, PushI32(-1)],
else: Some([Gt(implicit 0>), If {
then: [PushI32(1)],
else: Some([PushI32(0)])
}])
}]
- Optimizer recurses into both If bodies. No constant conditions → no DCE.
- Codegen: nested WASM
if/else/endblocks.
Exercise 12: DOES> Defining Word
: CONSTANT CREATE , DOES> @ ;
5 CONSTANT FIVE
FIVE .
Answer
: CONSTANTenters compile modeCREATE— flagged as saw_create_in_def=true,— compiled normallyDOES>— splits definition:- create_ir = everything before DOES> (the
,call) - does_action = everything after DOES> (the
@call) → compiled as separate word - Stores DoesDefinition { create_ir, does_action_id, has_create: true }
- create_ir = everything before DOES> (the
5 CONSTANT FIVE:- CONSTANT executes its defining behavior
- CREATE makes dictionary entry "FIVE"
,stores 5 at FIVE's parameter field- DOES> patches FIVE to execute the does_action (which does
@)
FIVE .:- FIVE executes: pushes its PFA, then calls does_action (
@) @fetches the 5 stored there.prints "5 "
- FIVE executes: pushes its PFA, then calls does_action (
Exercise 13: Consolidation
: A 1 ;
: B 2 ;
: C A B + ;
CONSOLIDATE
Answer
- Before CONSOLIDATE: A, B, C are separate WASM modules. C calls A and B via
call_indirectthrough the function table. - CONSOLIDATE:
- Collects all IR bodies: A=[PushI32(1)], B=[PushI32(2)], C=[Call(a_id), Call(b_id), Add(inlined)]
- Builds local_fn_map: A→1, B→2, C→3 (within consolidated module)
compile_consolidated_module(): all three become functions in one WASM module- C's Call(a_id) → direct
call 1(not call_indirect) - Replaces all table entries with new functions
- Result: C calling A and B is now a direct WASM
call— much faster than table dispatch.
Exercise 14: Host Function Execution
5 3 M*
Answer
5→ push to data stack (dsp -= 4, mem[dsp] = 5)3→ push to data stack (dsp -= 4, mem[dsp] = 3)M*→ host function (Rust closure):- Read sp = dsp global value
- Read n2 = mem[sp] = 3 (as i64)
- Read n1 = mem[sp+4] = 5 (as i64)
- result = 5i64 * 3i64 = 15i64
- lo = 15 as i32 = 15
- hi = (15 >> 32) as i32 = 0
- Write mem[sp+4] = 15 (lo), mem[sp] = 0 (hi)
- Stack unchanged (still 2 cells, now containing double-cell 15)
- Note: M* is a host function because it needs 64-bit multiplication (WASM i32 only)
Exercise 15: Float Operations
: HYPOTENUSE FDUP F* FSWAP FDUP F* F+ FSQRT ;
Answer
- After inlining:
[FDup, FMul, FSwap, FDup, FMul, FAdd, FSqrt] - Peephole: No matching patterns (FDup+FMul not a known pair)
- Codegen: All float ops use the float stack (FSP global):
- FDup:
fpeek(f)thenfpush_via_local - FMul:
emit_float_binarywithf64.mul - FSqrt:
emit_float_unarywithf64.sqrt
- FDup:
- Float stack lives at 0x2540-0x2D40 in linear memory
Exercise 16: BEGIN WHILE REPEAT
: COUNTDOWN BEGIN DUP WHILE DUP . 1 - REPEAT DROP ;
Answer
BEGIN→ ControlEntry::Begin { body: [] }DUP→ Call(dup_id) into bodyWHILE→ pop Begin, create ControlEntry::BeginWhile { test: [Call(dup_id)], body: [] }DUP . 1 -→ into bodyREPEAT→ pop BeginWhile, emitBeginWhileRepeat { test: [Dup], body: [Dup, Call(dot_id), PushI32(1), Sub] }- Semantics: evaluate test; if false exit loop; execute body; jump to BEGIN
Exercise 17: Batch Mode Compilation
( During ForthVM::new() )
Answer
register_primitives()setsbatch_mode = true- Each
register_primitive("DUP", ...):- Creates dictionary entry (dictionary.create + reveal)
- Stores IR body in ir_bodies
- Pushes
(word_id, ir_body)todeferred_ir(no WASM compilation yet)
- After all ~40 IR primitives registered:
compile_batch()compiles ALL deferred IR into a single WASM module- One
rt.instantiate_and_install()call — single module with ~40 functions - Each function registered in the table
- Why batch? Amortizes runtime compilation overhead. One module instead of 40.
- Host functions bypass batch_mode — registered via
rt.register_host_func()with HostFn closures.
Exercise 18: wafer build Pipeline
( file: hello.fth )
: MAIN ." Hello, World!" CR ;
wafer build hello.fth -o hello.wasm
Answer
cmd_build(): create ForthVM, set recording=true, evaluate sourceevaluate(): compiles MAIN normally (IR → optimize → codegen)recording_toplevel=true: but MAIN is a definition, not top-level execution, so toplevel_ir stays emptyexport_module():- Collect IR words: MAIN + all boot.fth definitions
- Entry point: no --entry flag, look for MAIN → found!
- Build
local_fn_map: all words get module-internal indices compile_exportable_module(): single WASM module with all functions- Data section: snapshot of linear memory (dictionary, variables, etc.)
- Metadata in "wafer" custom section: version, entry index, host functions, memory size, stack pointers
- Output: hello.wasm file
Exercise 19: Stack-to-Local Promotion
: ADD3 + + ;
Answer
- After inlining:
[Add, Add] - Stack-to-local promotion (codegen pass, not optimizer):
- Analyzes stack flow: first Add pops 2, pushes 1; second Add pops 2 (including that 1), pushes 1
- If stack depth is statically known at each point → can use WASM locals instead of memory stack
- Result: operands stay in WASM locals/operand stack, no memory reads/writes
- Much faster: avoids load/store through linear memory
- Promotion only works for "straight-line" code (no calls that might modify the stack unpredictably)
Exercise 20: MARKER and State Restore
MARKER CLEAN
: FOO 1 ;
: BAR 2 ;
CLEAN
FOO \ Error: unknown word
Answer
MARKER CLEAN:- Creates a MarkerState snapshot: dictionary state, user_here, next_table_index, word_pfa_map, ir_bodies, does_definitions, host_word_names, two_value_words, fvalue_words
- Registers CLEAN as a word that, when executed, restores this snapshot
: FOO 1 ; : BAR 2 ;— normal compilation, adds to dictionaryCLEAN:- Executes the marker word
- Restores dictionary to state before FOO/BAR were defined
- Resets user_here, ir_bodies, etc.
- FOO and BAR are gone — dictionary.find("FOO") returns None
FOO→ "unknown word: FOO"
Key: MARKER doesn't undo WASM table entries (they become unreachable but stay allocated). It restores the dictionary and Rust-side metadata.