Move to boot.fth: TYPE, SPACES, <#, HOLD, HOLDS, SIGN, #, #S, #>,
., U., .R, U.R, D., D.R. The Forth . now uses pictured numeric output
(standard Forth approach) instead of a Rust formatting closure.
Add M*, UM*, UM/MOD host functions to the WASM runner so that the
Forth # word (which calls UM/MOD) works in standalone mode.
Removed 660 lines of Rust closures + 5 dead helper functions.
All 426 tests pass.
HERE is now defined in boot.fth as `: HERE 12 @ ;` (reads SYSVAR_HERE
from WASM linear memory). The Rust side syncs user_here to memory[12]:
- At the start of each evaluate() call (sync_here_to_wasm)
- In each host function that modifies HERE (ALLOT, comma, C-comma, ALIGN)
This avoids per-token sync overhead — only 2 sync points per evaluate()
call plus host-function writes. Removed the HERE host function closure
(~30 lines). All 426 tests pass.
Now that the optimizer TailCall/inline bug is fixed, SM/REM, FM/MOD,
*/, and */MOD can be defined in Forth using M* and UM/MOD as primitives.
SM/REM uses DABS (which calls DNEGATE → D+) inside conditional branches
with return-stack items — exactly the pattern that triggered the bug.
Removed ~200 lines of Rust closures. All 426 tests pass.
When the tail-call pass converted a Call to TailCall inside an If branch,
and the inliner subsequently inlined that word, the TailCall was not
converted back to Call in nested control-flow bodies. The TailCall codegen
emits a Return instruction, which would exit the *caller* instead of just
the inlined callee — silently corrupting the return stack.
Root cause: the inliner only converted top-level TailCalls in the body
(line-by-line iteration), missing TailCalls nested inside If/DoLoop/Begin
structures.
Fix: add detailcall() that recursively walks the entire IR tree and
converts all TailCall ops back to Call before inlining.
This unblocks defining complex Forth words (like SM/REM, FM/MOD) that
use DABS → DNEGATE → D+ chains with return-stack operations inside
conditional branches.
426 tests pass (including new regression test).
Add 14 double-cell words to boot.fth: D+, D-, DNEGATE, DABS, D0=, D0<,
D=, D<, D2*, D2/, DMAX, DMIN, M+, DU<.
Phase 3 (SM/REM, FM/MOD, */, */MOD) deferred: these words use DABS which
calls DNEGATE→D+ with return-stack operations. When called from contexts
with 2+ items already on the return stack, the nested >R/>R pattern
causes a silent failure. Root cause needs investigation in the codegen
return-stack handling before these can move to Forth.
All 425 tests pass.
Move to boot.fth: D+, D-, DNEGATE, DABS, D0=, D0<, D=, D<, D2*, D2/,
DMAX, DMIN, M+, DU<.
D+ uses proper carry detection via unsigned comparison after low-cell
addition. All other double-cell words build on D+ and standard Forth
stack operations.
Removed 544 lines of Rust closures. Cumulative: ~1,091 Rust lines removed
across Phases 1-2, replaced by ~80 lines of Forth. All 425 tests pass.
Create boot.fth loaded at startup after IR primitives are compiled.
Forth-compiled WASM with direct calls outperforms host function dispatch
(no call_indirect overhead, Cranelift can inline across word boundaries).
Words moved to Forth: 2OVER, 2ROT, WITHIN, 2@, 2!, FILL, CMOVE, CMOVE>,
MOVE, ERASE, BLANK, /STRING, -TRAILING.
Removed 547 lines of Rust closures, replaced by 48 lines of Forth.
All 425 tests pass.
Add `wafer build --native` to produce self-contained native executables.
The approach appends AOT-precompiled WASM and metadata to a copy of the
wafer binary itself, requiring no Rust toolchain at build time.
On startup, the binary checks for an appended payload (8-byte "WAFEREXE"
magic trailer). If found, it deserializes the precompiled module and runs
it directly, skipping CLI argument parsing entirely.
Uses wasmtime's Engine::precompile_module() for AOT compilation at build
time and Module::deserialize() at runtime — instant startup with no JIT.
Binary layout: [wafer binary][precompiled wasm][metadata json][trailer]
Trailer: payload_len(u64 LE) + metadata_len(u64 LE) + "WAFEREXE"
Also refactored runner.rs: extracted shared run_module() to avoid
duplication between run_wasm_bytes() and run_precompiled_bytes().
Made serialize_metadata() public for CLI use.
Add `wafer build` to compile Forth source files to standalone .wasm modules,
and `wafer run` to execute them. The same .wasm file works with both the
wafer runtime (via wasmtime) and in browsers (via generated JS loader).
New CLI subcommands:
- `wafer build file.fth -o file.wasm` — compile to standalone WASM
- `wafer build file.fth -o file.wasm --js` — also generate JS/HTML loader
- `wafer build file.fth --entry WORD` — custom entry point
- `wafer run file.wasm` — execute pre-compiled module
Entry point resolution: --entry flag > MAIN word > recorded top-level execution.
Memory snapshot embedded as WASM data section preserves VARIABLE/CONSTANT state.
Metadata in custom "wafer" section enables the runner to provide host functions.
New modules: export.rs (orchestration), runner.rs (wasmtime host), js_loader.rs
(browser support). Refactored codegen.rs to share logic between consolidation
and export via compile_multi_word_module(). Added ir_bodies tracking for
VARIABLE, CONSTANT, CREATE, VALUE, DEFER, BUFFER:, MARKER, 2CONSTANT,
2VARIABLE, 2VALUE, FVARIABLE defining words.
Removed dead code: dot_func field, unused wafer-web stub crate, wasmtime-wasi
dependency from CLI, orphaned --consolidate/--output CLI flags.
425 tests pass (414 original + 11 new including 7 round-trip integration tests).
All were planning artifacts never imported or loaded:
- forth/ (4 .fth files): commented-out TODO stubs, never loaded at startup
- crates/core/src/words/mod.rs: empty module with commented-out submodules
- compiler.rs: placeholder, all compiler logic lives in outer.rs
- primitives.rs: placeholder, all primitives registered in outer.rs
- types.rs: StackType/StackEffect defined but never imported anywhere
Batch-compile all ~64 IR primitives into a single WASM module at startup.
Replaces 64 separate Module::new + Instance::new with 1 of each.
Reuses compile_consolidated_module() directly, removed compile_core_module() stub.
Boot time: 7.7ms -> 0.6ms (release), test suite: 5.1s -> 1.5s (debug).
13 of 14 optimizations now implemented. 392 tests passing.
Stack-to-local promotion (Phase 1: straight-line code):
- Words with no control flow/calls use WASM locals instead of memory stack
- Stack manipulation (Swap, Rot, Nip, Tuck, Dup, Drop) emits ZERO instructions
- ~7x instruction reduction for arithmetic-heavy words like DUP *
- Pre-loads consumed items from memory, writes results back at exit
Consolidation recompiler (CONSOLIDATE word):
- Recompiles all IR-based words into single WASM module
- Direct call instructions instead of call_indirect through function table
- Cranelift can inline and optimize across word boundaries
- All control flow variants support consolidated calls
342 unit tests + 11 compliance, all passing.
Inlining: store IR bodies for all words, inline Call(id) when body <= 8 ops
and non-recursive. Convert TailCall back to Call when inlining (tail position
in callee is not tail position in caller -- found via compliance test failure
where inlined TailCall caused unreachable code after the call site).
DSP global caching: cache $dsp in WASM local 0 at function entry, use
local.get/set throughout, writeback before calls and at function exit.
Reduces global access instructions by ~30-40%.
323 unit tests + 11 compliance, all passing.
Replace placeholder compliance tests with real harness that boots WAFER,
loads Gerry Jackson's test suite, and asserts 0 errors per word set.
Passing word sets (11/13):
Core, Core Plus, Core Ext, Exception, Double-Number, String,
Search-Order, Memory-Allocation, Programming-Tools, Facility, Locals
Not yet: File-Access (needs WASI), Floating-Point, Extended-Character
272 total tests (261 unit + 11 compliance)
- Add docs/OPTIMIZATIONS.md: catalog of 14 optimization passes with
status tracking and implementation roadmap
- Configure workspace-level clippy and rustc lints in Cargo.toml
- Add clippy.toml and deny.toml for clippy thresholds and dependency
auditing (licenses, advisories, bans)
- Set up pre-commit hook: cargo fmt, dprint, clippy, cargo deny,
cargo machete
- Update Justfile with deny/machete targets, dprint in fmt checks
Core word set: 0 errors on Gerry Jackson's forth2012-test-suite/core.fr
- Fix POSTPONE for non-immediate words via COMPILE, mechanism
- Fix double-DOES> (WEIRD: pattern) with does-body scanning and
runtime patching via _DOES_PATCH_
- Implement CATCH/THROW exception handling using wasmtime trap
mechanism with stack pointer save/restore
- 232 tests passing
Rename ABOUT_FORTH.md to FORTH.md and rewrite to cover Forth's unique
position as simultaneously low-level and high-level, where Forth is used
today (Philae lander, Open Firmware, embedded systems), and why Forth
maps naturally onto WebAssembly's stack machine architecture.
- Fix HERE corruption: sync user_here before writing to shared cell
- Fix DOES> without CREATE: patch most-recent word, not read new name
- Implement >BODY via word_pfa_map tracking parameter field addresses
- Nested BEGIN...WHILE...WHILE...REPEAT...ELSE...THEN support
- DEPTH overflow protection
- Forth 2012 core.fr: 3 errors remaining (POSTPONE edge case,
double-DOES>, NOP meta-programming)
Major compliance fixes for running Gerry Jackson's core.fr tests:
- >IN synchronization: outer interpreter reads >IN back from WASM memory
after each word, enabling TESTING and other >IN-manipulating words
- RSHIFT changed to logical (unsigned) shift per Forth 2012 spec
- +LOOP uses boundary-crossing termination check for negative steps
- HEX/DECIMAL compile as WASM primitives (work inside definitions)
- BASE read from WASM memory for all number formatting
- Pictured numeric output: <# # #S #> HOLD SIGN
- New words: 2@ 2! .( ] ArithRshift
- Error recovery resets compile state on failure
- FIND reads counted strings from WASM memory
- Forth 2012 core.fr: 58 errors remaining (from unable-to-load)
README now documents all 70+ implemented words, working examples,
architecture overview, and accurate compliance status.
CLAUDE.md updated with actual file descriptions, patterns for adding
new words, and current test count.
- Dictionary: linked-list word headers in simulated linear memory with
create/find/reveal, case-insensitive lookup, IMMEDIATE flag support
- WASM codegen: IR-to-WASM translation via wasm-encoder with full
validation; all stack, arithmetic, comparison, logic, memory, control
flow, and return stack operations; wasmtime execution tests
- Outer interpreter: tokenizer, number parsing (decimal/$hex/#dec/%bin),
interpret/compile dispatch, control structures (IF/ELSE/THEN,
BEGIN/UNTIL, BEGIN/WHILE/REPEAT), RECURSE, comments, string output
- 40+ primitive words registered via JIT-compiled WASM modules linked
to shared memory/globals/table
- Interactive REPL with rustyline, piped input, and file execution
- 145 tests passing across dictionary, codegen, and runtime
Optimizing Forth 2012 compiler targeting WebAssembly with IR-based
compilation pipeline, multi-typed stack inference, subroutine threading,
and JIT/consolidation modes. Rust kernel with ~35 primitives and Forth
standard library for core/core-ext word sets.