Three compile-time words for unstructured control flow:
- AHEAD: unconditional forward branch (code to THEN skipped)
- CS-PICK: duplicate control-flow stack entries (enables multi-exit loops)
- CS-ROLL: rotate control-flow stack entries (reorder IF/THEN resolution)
Also adds POSTPONE support for compile-time keywords (IF, UNTIL, etc.)
via a __CTRL__ host function and unified pending_actions queue.
Key design:
- LoopRestartIfFalse IR op desugars into nested If nodes for CS-PICK'd
BEGIN+UNTIL patterns (multiple backward branches in one loop)
- Flat Block/BranchIfFalse/EndBlock IR ops for CS-ROLL'd IF/THEN
patterns where structured If nesting would consume wrong flags
- First-iteration flag local for AHEAD-into-BEGIN patterns (PT8)
Enables 12th compliance test (compliance_tools): all 11+1 now pass.
Three bugs fixed to safely enable promotion for control flow:
1. compute_stack_needs now recurses into IF/DoLoop/Begin bodies,
correctly calculating preload counts for promoted words with
nested control flow (was flat, causing stack underflow).
2. BeginDoubleWhileRepeat rejected from promotion (boot.fth's
-TRAILING uses this pattern, handler had structural bugs).
3. IF/ELSE branches must have same net stack effect for promotion
(BITSSET? has asymmetric branches: 2 items vs 1).
Performance with promotion enabled:
- Factorial: 0.50x (2x faster than gforth)
- Collatz: 0.38x (2.6x faster than gforth)
- All 427 unit tests, 10/11 compliance, 35/35 behavioral pass
Extends the promoted codegen path (StackSim) with handlers for DoLoop,
BeginWhileRepeat, BeginUntil, BeginAgain, If/Else/Then, RFetch, LoopJ,
and Exit. Includes loop-iteration fixup to copy modified locals back to
loop-top positions, and IF branch state merging.
The promotion is currently gated off for control flow (is_promotable
rejects all loops/IF) pending fix for edge cases in the Forth 2012 test
suite. The infrastructure is ready to enable incrementally.
When briefly enabled for testing, showed dramatic results:
- Factorial: 0.49x (2x faster than gforth)
- Collatz: 0.17x (6x faster than gforth)
Two-path DO/LOOP codegen based on static analysis of the loop body:
- Fast path (no calls, no >R/R> in body): index and limit live purely
in WASM locals with zero return stack traffic per iteration. RFetch (I)
and LoopJ (J) resolve to local.get instead of memory access.
- Slow path (body has calls or explicit RS ops): locals still used for
loop control, but synced to return stack for LEAVE/UNLOOP compatibility.
Also converts J from a host function (WASM→Rust roundtrip per call) to
an IR primitive (IrOp::LoopJ) that compiles to local.get of the outer
loop's index local.
Performance impact (vs gforth, all opts enabled):
- Factorial: 1.02x → 0.94x (now faster than gforth)
- NestedLoops: 717x → 543x (24% faster, still bottlenecked by data stack)
- Fibonacci, GCD, Collatz: unchanged (don't use DO/LOOP)
Major compliance push bringing WAFER from 3 to 10 passing Forth 2012
compliance test suites (Core, Core Extensions, Core Plus, Double,
Exception, Facility, Locals, Memory, Search Order, String).
Compiler/runtime fixes:
- DEFER: host function via pending_define, works inside colon defs
- COMPILE,: handle_pending_compile in execute_word for [...] sequences
- MARKER: full save/restore with pending_marker_restore mechanism
- IMMEDIATE: changed from XOR toggle to OR set per Forth 2012 spec
- ABORT": throw -2 via THROW, no message display when caught
- M*/: symmetric division to match WAFER's / behavior
- pending_define: single i32 flag → Vec<i32> queue for multi-action words
- Optimizer: prevent inlining words containing EXIT or ForthLocal ops
- +LOOP: corrected boundary check formula with AND step comparison
- REPEAT: accept bare BEGIN (unstructured IF...BEGIN...REPEAT)
- Auto-close unclosed IFs at ; for unstructured control flow
- _create_part_: use reserve_fn_index to preserve dictionary.latest()
Memory layout:
- Separate PICT_BUF and WORD_BUF regions to prevent PAD overlap
- Updated DEPTH hardcoded DATA_STACK_TOP in boot.fth
New word sets:
- [IF]/[ELSE]/[THEN]/[DEFINED]/[UNDEFINED]: conditional compilation
- UNESCAPE/SUBSTITUTE/REPLACES: string substitution (host functions)
- Locals {: syntax: parser, ForthLocalGet/Set IR ops, WASM local codegen
- ENVIRONMENT? support for #LOCALS (returns 16)
- N>R/NR>/SYNONYM: programming-tools extensions
- Search Order: ONLY, ALSO, PREVIOUS, DEFINITIONS, FORTH,
FORTH-WORDLIST, GET-ORDER, SET-ORDER, GET-CURRENT, SET-CURRENT,
WORDLIST, SEARCH-WORDLIST with full multi-wordlist dictionary support
via Arc<Mutex> shared state for immediate effect from compiled code
Remaining: 1 cascade error in Programming-Tools from CS-PICK/CS-ROLL
(unstructured control-flow stack manipulation, requires flat IR).
Six fixes for compliance test regressions introduced in Phases 7-8:
- LEAVE + +LOOP with step=0 caused infinite loop: the XOR termination
check yields 0 when index=limit and step=0. Added SYSVAR_LEAVE_FLAG
mechanism — LEAVE sets flag, +LOOP checks it, all loops clear on exit.
- DEPTH was off-by-one: `5440 SP@ -` pushed the literal before SP@
read the stack pointer, making SP@ see one extra cell. Reordered to
`SP@ 5440 SWAP -` so SP@ reads dsp before any literal push.
- */ and */MOD used FM/MOD (floored) but WAFER's / uses WASM i32.div_s
(symmetric). Changed to SM/REM for consistency.
- EVALUATE didn't sync input buffer to WASM memory, breaking SOURCE
and >IN manipulation inside evaluated strings. Added input-only sync
(without touching STATE/BASE) and >IN readback after each token.
- WORD didn't skip leading spaces when delimiter != space, causing
GN' and GS3 tests to read whitespace instead of content.
- Added ACCEPT stub returning 0 for non-interactive mode.
- Added bounds check in refresh_user_here to reject corrupted
SYSVAR_HERE values beyond WASM memory size.
Core and Facility compliance suites now pass. Other suites have
pre-existing regressions from Phases 1-8 still under investigation.
Add `wafer build` to compile Forth source files to standalone .wasm modules,
and `wafer run` to execute them. The same .wasm file works with both the
wafer runtime (via wasmtime) and in browsers (via generated JS loader).
New CLI subcommands:
- `wafer build file.fth -o file.wasm` — compile to standalone WASM
- `wafer build file.fth -o file.wasm --js` — also generate JS/HTML loader
- `wafer build file.fth --entry WORD` — custom entry point
- `wafer run file.wasm` — execute pre-compiled module
Entry point resolution: --entry flag > MAIN word > recorded top-level execution.
Memory snapshot embedded as WASM data section preserves VARIABLE/CONSTANT state.
Metadata in custom "wafer" section enables the runner to provide host functions.
New modules: export.rs (orchestration), runner.rs (wasmtime host), js_loader.rs
(browser support). Refactored codegen.rs to share logic between consolidation
and export via compile_multi_word_module(). Added ir_bodies tracking for
VARIABLE, CONSTANT, CREATE, VALUE, DEFER, BUFFER:, MARKER, 2CONSTANT,
2VARIABLE, 2VALUE, FVARIABLE defining words.
Removed dead code: dot_func field, unused wafer-web stub crate, wasmtime-wasi
dependency from CLI, orphaned --consolidate/--output CLI flags.
425 tests pass (414 original + 11 new including 7 round-trip integration tests).
Batch-compile all ~64 IR primitives into a single WASM module at startup.
Replaces 64 separate Module::new + Instance::new with 1 of each.
Reuses compile_consolidated_module() directly, removed compile_core_module() stub.
Boot time: 7.7ms -> 0.6ms (release), test suite: 5.1s -> 1.5s (debug).
13 of 14 optimizations now implemented. 392 tests passing.
Stack-to-local promotion (Phase 1: straight-line code):
- Words with no control flow/calls use WASM locals instead of memory stack
- Stack manipulation (Swap, Rot, Nip, Tuck, Dup, Drop) emits ZERO instructions
- ~7x instruction reduction for arithmetic-heavy words like DUP *
- Pre-loads consumed items from memory, writes results back at exit
Consolidation recompiler (CONSOLIDATE word):
- Recompiles all IR-based words into single WASM module
- Direct call instructions instead of call_indirect through function table
- Cranelift can inline and optimize across word boundaries
- All control flow variants support consolidated calls
342 unit tests + 11 compliance, all passing.
Inlining: store IR bodies for all words, inline Call(id) when body <= 8 ops
and non-recursive. Convert TailCall back to Call when inlining (tail position
in callee is not tail position in caller -- found via compliance test failure
where inlined TailCall caused unreachable code after the call site).
DSP global caching: cache $dsp in WASM local 0 at function entry, use
local.get/set throughout, writeback before calls and at function exit.
Reduces global access instructions by ~30-40%.
323 unit tests + 11 compliance, all passing.
- Fix HERE corruption: sync user_here before writing to shared cell
- Fix DOES> without CREATE: patch most-recent word, not read new name
- Implement >BODY via word_pfa_map tracking parameter field addresses
- Nested BEGIN...WHILE...WHILE...REPEAT...ELSE...THEN support
- DEPTH overflow protection
- Forth 2012 core.fr: 3 errors remaining (POSTPONE edge case,
double-DOES>, NOP meta-programming)
Major compliance fixes for running Gerry Jackson's core.fr tests:
- >IN synchronization: outer interpreter reads >IN back from WASM memory
after each word, enabling TESTING and other >IN-manipulating words
- RSHIFT changed to logical (unsigned) shift per Forth 2012 spec
- +LOOP uses boundary-crossing termination check for negative steps
- HEX/DECIMAL compile as WASM primitives (work inside definitions)
- BASE read from WASM memory for all number formatting
- Pictured numeric output: <# # #S #> HOLD SIGN
- New words: 2@ 2! .( ] ArithRshift
- Error recovery resets compile state on failure
- FIND reads counted strings from WASM memory
- Forth 2012 core.fr: 58 errors remaining (from unable-to-load)
- Dictionary: linked-list word headers in simulated linear memory with
create/find/reveal, case-insensitive lookup, IMMEDIATE flag support
- WASM codegen: IR-to-WASM translation via wasm-encoder with full
validation; all stack, arithmetic, comparison, logic, memory, control
flow, and return stack operations; wasmtime execution tests
- Outer interpreter: tokenizer, number parsing (decimal/$hex/#dec/%bin),
interpret/compile dispatch, control structures (IF/ELSE/THEN,
BEGIN/UNTIL, BEGIN/WHILE/REPEAT), RECURSE, comments, string output
- 40+ primitive words registered via JIT-compiled WASM modules linked
to shared memory/globals/table
- Interactive REPL with rustyline, piped input, and file execution
- 145 tests passing across dictionary, codegen, and runtime
Optimizing Forth 2012 compiler targeting WebAssembly with IR-based
compilation pipeline, multi-typed stack inference, subroutine threading,
and JIT/consolidation modes. Rust kernel with ~35 primitives and Forth
standard library for core/core-ext word sets.