Update all docs to reflect current state

README: 392 tests, 200+ words, 12 word sets, optimization pipeline described
CLAUDE.md: 200+ words, 12 word sets, 392 tests, added optimizer/config/consolidate to key files
OPTIMIZATIONS.md: update all 14 section statuses (12 done, 2 not started)
WAFER.md: correct line counts, add optimizer/config/consolidate/types to project layout, add FSP global
This commit is contained in:
2026-04-02 12:47:50 +02:00
parent 94f6cb6941
commit 8c53afa28a
4 changed files with 34 additions and 23 deletions
+5 -2
View File
@@ -2,7 +2,7 @@
## What is WAFER?
WAFER (WebAssembly Forth Engine in Rust) is an optimizing Forth 2012 compiler targeting WebAssembly. Currently a working Forth system with 130+ words, JIT compilation, and 11 word sets at 100% compliance.
WAFER (WebAssembly Forth Engine in Rust) is an optimizing Forth 2012 compiler targeting WebAssembly. Currently a working Forth system with 200+ words, JIT compilation, 12 word sets at 100% compliance, and a full optimization pipeline (peephole, constant folding, inlining, strength reduction, DCE, tail calls, stack-to-local promotion, consolidation).
## Architecture
@@ -19,6 +19,9 @@ WAFER (WebAssembly Forth Engine in Rust) is an optimizing Forth 2012 compiler ta
- `crates/core/src/dictionary.rs` -- Dictionary data structure with create/find/reveal
- `crates/core/src/ir.rs` -- IrOp enum (the intermediate representation)
- `crates/core/src/memory.rs` -- Memory layout constants (stack regions, dictionary base, etc.)
- `crates/core/src/optimizer.rs` -- IR optimization passes (peephole, fold, inline, DCE, etc.)
- `crates/core/src/config.rs` -- WaferConfig: unified optimization configuration
- `crates/core/src/consolidate.rs` -- Consolidation recompiler (single-module direct calls)
- `crates/cli/src/main.rs` -- CLI REPL with rustyline
## Adding a New Word
@@ -51,7 +54,7 @@ Handle in `interpret_token_immediate()` or `compile_token()` as a special case.
## Testing
- Run `cargo test --workspace` before committing (currently 261 unit + 11 compliance tests)
- Run `cargo test --workspace` before committing (currently 392 tests: 380 unit + 1 benchmark + 11 compliance)
- Forth 2012 compliance: `cargo test -p wafer-core --test compliance`
- Test helper in outer.rs: `eval_output("forth code")` returns printed output as String
- Test helper: `eval_stack("forth code")` returns data stack as Vec<i32>
+7 -4
View File
@@ -6,7 +6,7 @@ An optimizing Forth 2012 compiler targeting WebAssembly.
## Status
WAFER is a working Forth system. It JIT-compiles each word definition to a separate WASM module and executes via `wasmtime`. 310 tests passing (299 unit + 11 compliance), **0 errors on all 12 tested Forth 2012 word sets** including Floating-Point.
WAFER is a working Forth system with an optimizing compiler. It JIT-compiles each word definition to a separate WASM module and executes via `wasmtime`. 392 tests passing (380 unit + 1 benchmark + 11 compliance), **0 errors on all 12 tested Forth 2012 word sets** including Floating-Point.
**Working features:**
@@ -50,7 +50,7 @@ Forth Source -> Outer Interpreter -> IR -> [Optimize] -> WASM Codegen (wasm-enco
- **Subroutine threading** via WASM function tables and `call_indirect`
- **JIT mode**: each new word compiles to a separate WASM module linked to shared memory/globals/table
- **IR-based pipeline** enables future optimization passes before WASM emission
- **IR-based pipeline** with 6 optimization passes (peephole, constant folding, strength reduction, DCE, tail call detection, inlining) plus stack-to-local promotion and consolidation
- **Dictionary**: linked-list word headers in simulated linear memory
## Building
@@ -75,12 +75,15 @@ echo ': SQUARE DUP * ; 7 SQUARE .' | cargo run -p wafer
## Testing
```bash
# All tests (185 currently passing)
# All tests (392 currently passing)
cargo test --workspace
# Forth 2012 compliance dashboard
cargo test -p wafer-core --test compliance
# Optimization benchmark report
cargo test -p wafer-core --test benchmark_report -- --nocapture --ignored
# Lints
cargo clippy --workspace
```
@@ -117,7 +120,7 @@ tests/ Forth 2012 compliance suite (gerryjackson/forth2012-test-suite sub
### Not Yet Implemented
11 word sets at 100% compliance: Core, Core Ext, Core Plus, Exception, Double-Number, String, Search-Order, Memory-Allocation, Programming-Tools, Facility, Locals. 130+ words including VALUE, DEFER, CASE, DOES>, CATCH/THROW, double-cell arithmetic, string operations.
12 word sets at 100% compliance: Core, Core Ext, Core Plus, Exception, Double-Number, String, Search-Order, Memory-Allocation, Programming-Tools, Facility, Locals, Floating-Point. 200+ words including VALUE, DEFER, CASE, DOES>, CATCH/THROW, double-cell arithmetic, string operations, and 70+ floating-point words.
## Compliance Status
+14 -14
View File
@@ -31,7 +31,7 @@ This document describes every optimization that makes sense for WAFER, why it ma
## 1. Stack-to-Local Promotion
**Status: Not implemented.** Type infrastructure exists (`crates/core/src/types.rs`) but is not wired into codegen.
**Status: Phase 1 done.** Straight-line words (no control flow, calls, or I/O) use WASM locals instead of memory stack. Stack manipulation ops (Swap, Rot, Nip, Tuck, Dup, Drop) emit zero WASM instructions. Switchable via `WaferConfig::codegen.stack_to_local_promotion`.
### The Problem
@@ -105,7 +105,7 @@ When the compiler can statically determine the types and lifetimes of values on
## 2. Peephole Optimization
**Status: Not implemented.**
**Status: Done.** Implemented in `optimizer.rs::peephole()`. Runs as fixpoint loop.
A peephole optimizer scans adjacent IR operations and replaces recognized patterns with cheaper equivalents. This is the lowest-effort, highest-return IR pass because Forth's postfix style generates many redundant sequences.
@@ -134,7 +134,7 @@ A single function `fn peephole(ops: Vec<IrOp>) -> Vec<IrOp>` that makes repeated
## 3. Constant Folding
**Status: Not implemented.**
**Status: Done.** Implemented in `optimizer.rs::constant_fold()`. Folds binary and unary ops on known constants.
When both operands of an operation are compile-time constants, compute the result at compile time.
@@ -163,7 +163,7 @@ A function `fn constant_fold(ops: Vec<IrOp>) -> Vec<IrOp>` that simulates a comp
## 4. Inlining
**Status: Not implemented.**
**Status: Done.** Implemented in `optimizer.rs::inline()`. Inlines word bodies <= 8 ops, non-recursive. TailCall converted back to Call when inlining. IR bodies stored in `ForthVM::ir_bodies`.
Replace `Call(WordId)` with the callee's IR body, avoiding the `call_indirect` overhead and enabling further optimization of the combined code.
@@ -212,7 +212,7 @@ PushI32(34)
## 5. Strength Reduction
**Status: Not implemented.**
**Status: Done.** Implemented in `optimizer.rs::strength_reduce()`. Power-of-2 multiply to shift, zero comparisons to ZeroEq/ZeroLt.
Replace expensive operations with cheaper equivalents when one operand is a known constant.
@@ -231,7 +231,7 @@ The most common case is `CELLS` which is defined as `PushI32(4), Mul`. Strength
## 6. Dead Code Elimination
**Status: Not implemented.**
**Status: Done.** Implemented in `optimizer.rs::dce()`. Truncates after Exit, eliminates constant-conditional branches.
Remove IR operations that can never execute or whose results are never used.
@@ -246,7 +246,7 @@ DCE should run after constant folding, since folding can create new constant con
## 7. Tail Call Optimization
**Status: Partial.** `IrOp::TailCall(WordId)` exists in `ir.rs` and codegen handles it in `codegen.rs`, but the compiler never generates it.
**Status: Done.** `optimizer.rs::tail_call_detect()` converts the last `Call` to `TailCall` when return stack is balanced. Codegen emits `call_indirect + return`.
### What Exists
@@ -272,7 +272,7 @@ Detection rule: if the last IR op in a word body (or in a branch of an `If`) is
## 8. Consolidation
**Status: Not implemented.** Stub exists at `crates/core/src/consolidate.rs`.
**Status: Done.** `CONSOLIDATE` word recompiles all IR-based words into a single WASM module with direct `call` instructions. Implemented in `codegen.rs::compile_consolidated_module()` and `outer.rs::consolidate()`.
### The Idea
@@ -300,7 +300,7 @@ After interactive development, `CONSOLIDATE` recompiles all defined words into a
## 9. Compound IR Operations
**Status: Not implemented.**
**Status: Done.** `TwoDup` and `TwoDrop` IrOp variants with optimized codegen. Peephole converts `Over, Over -> TwoDup` and `Drop, Drop -> TwoDrop`.
Some common multi-op sequences have more efficient WASM implementations than emitting each op individually.
@@ -342,7 +342,7 @@ These can be added as new `IrOp` variants recognized by peephole and emitted by
## 10. Codegen Improvements
**Status: Not implemented.**
**Status: Done (DSP caching).** `$dsp` cached in WASM local 0, written back before calls and at function exit. Commutative optimization and loop index in local are future work.
These are improvements within `codegen.rs` `emit_op()` that do not require new IR operations.
@@ -394,7 +394,7 @@ i32.add ;; result on wasm stack
## 11. wasmtime Configuration
**Status: Not implemented.** Currently using `Engine::default()`.
**Status: Done.** NaN canonicalization disabled, module caching enabled via `cache_config_load_default()`.
### Available Knobs
@@ -410,7 +410,7 @@ Module caching is the most impactful: `wasmtime::Config::cache_config_load_defau
## 12. Dictionary Hash Index
**Status: Not implemented.**
**Status: Done.** `HashMap<String, (u32, u32, bool)>` in Dictionary struct. `find()` uses O(1) hash lookup with linked-list fallback. Updated on `reveal()` and `toggle_immediate()`.
The dictionary lookup (`dictionary.rs` `find()`) walks a linked list from the most recent entry backward, comparing names character by character. After registering 80+ primitives plus user words, every lookup during compilation scans the full list.
@@ -422,7 +422,7 @@ This affects **compile time** (word lookup during parsing), not runtime (compile
## 13. Startup Batching
**Status: Not implemented.** `compile_core_module()` stub exists in `codegen.rs`.
**Status: Not started.** `compile_core_module()` stub exists in `codegen.rs`.
Currently, each of the 80+ primitives registered at boot creates a separate WASM module: `wasm-encoder` builds it, `wasmparser` validates it, Cranelift compiles it, and wasmtime instantiates it. This happens 80+ times sequentially.
@@ -432,7 +432,7 @@ Batch all IR-based primitives into a single WASM module with multiple exported f
## 14. Float and Double-Cell Stack
**Status: Not implemented.** `PushI64` and `PushF64` exist as IR ops but are stubs in codegen.
**Status: Not started.** `PushI64` and `PushF64` exist as IR ops but are stubs in codegen. Float stack operations are currently all host functions.
The float stack lives in its own memory region (0x2540--0x2D40). Float operations will have the same memory-based overhead as integer operations, but worse: `f64` values are 8 bytes, doubling the memory traffic per push/pop. Stack-to-local promotion (section 1) is even more impactful for floats because WASM has native `f64` locals and operand stack support.
+8 -3
View File
@@ -8,10 +8,14 @@ WAFER (WebAssembly Forth Engine in Rust) is a Forth 2012 compiler that JIT-compi
crates/
core/src/
outer.rs ForthVM: outer interpreter, compiler, all primitives
codegen.rs IR-to-WASM translation, module generation
dictionary.rs Dictionary (linked list in Vec<u8>)
codegen.rs IR-to-WASM translation, module generation, stack-to-local promotion
dictionary.rs Dictionary (linked list in Vec<u8>, hash index for O(1) lookup)
ir.rs IrOp enum -- the intermediate representation
optimizer.rs IR optimization passes (peephole, fold, inline, DCE, etc.)
config.rs WaferConfig: unified optimization configuration
consolidate.rs Consolidation recompiler (single-module direct calls)
memory.rs Memory layout constants (addresses, sizes)
types.rs Stack type inference infrastructure
error.rs Error types
cli/src/
main.rs CLI REPL (rustyline), file execution
@@ -23,7 +27,7 @@ tests/
forth2012-test-suite/ Forth 2012 compliance test suite (submodule)
```
The entire compiler and runtime lives in `outer.rs` (~5200 lines). Codegen is in `codegen.rs` (~1500 lines). Everything else is supporting infrastructure.
The compiler and runtime lives in `outer.rs` (~10,400 lines). Codegen is in `codegen.rs` (~2,800 lines). The optimizer is in `optimizer.rs` (~800 lines). Everything else is supporting infrastructure.
## What Happens When You Start WAFER
@@ -39,6 +43,7 @@ wasmtime Store Runtime state container
Linear Memory 16 pages (1 MiB), expandable to 256 pages (16 MiB)
Global: DSP Data stack pointer, initialized to 0x1540
Global: RSP Return stack pointer, initialized to 0x2540
Global: FSP Float stack pointer, initialized to 0x2D40
Function Table 256 funcref entries (grows as needed)
```