Update all docs to reflect current state

README: 392 tests, 200+ words, 12 word sets, optimization pipeline described
CLAUDE.md: 200+ words, 12 word sets, 392 tests, added optimizer/config/consolidate to key files
OPTIMIZATIONS.md: update all 14 section statuses (12 done, 2 not started)
WAFER.md: correct line counts, add optimizer/config/consolidate/types to project layout, add FSP global
This commit is contained in:
2026-04-02 12:47:50 +02:00
parent 94f6cb6941
commit 8c53afa28a
4 changed files with 34 additions and 23 deletions
+5 -2
View File
@@ -2,7 +2,7 @@
## What is WAFER? ## What is WAFER?
WAFER (WebAssembly Forth Engine in Rust) is an optimizing Forth 2012 compiler targeting WebAssembly. Currently a working Forth system with 130+ words, JIT compilation, and 11 word sets at 100% compliance. WAFER (WebAssembly Forth Engine in Rust) is an optimizing Forth 2012 compiler targeting WebAssembly. Currently a working Forth system with 200+ words, JIT compilation, 12 word sets at 100% compliance, and a full optimization pipeline (peephole, constant folding, inlining, strength reduction, DCE, tail calls, stack-to-local promotion, consolidation).
## Architecture ## Architecture
@@ -19,6 +19,9 @@ WAFER (WebAssembly Forth Engine in Rust) is an optimizing Forth 2012 compiler ta
- `crates/core/src/dictionary.rs` -- Dictionary data structure with create/find/reveal - `crates/core/src/dictionary.rs` -- Dictionary data structure with create/find/reveal
- `crates/core/src/ir.rs` -- IrOp enum (the intermediate representation) - `crates/core/src/ir.rs` -- IrOp enum (the intermediate representation)
- `crates/core/src/memory.rs` -- Memory layout constants (stack regions, dictionary base, etc.) - `crates/core/src/memory.rs` -- Memory layout constants (stack regions, dictionary base, etc.)
- `crates/core/src/optimizer.rs` -- IR optimization passes (peephole, fold, inline, DCE, etc.)
- `crates/core/src/config.rs` -- WaferConfig: unified optimization configuration
- `crates/core/src/consolidate.rs` -- Consolidation recompiler (single-module direct calls)
- `crates/cli/src/main.rs` -- CLI REPL with rustyline - `crates/cli/src/main.rs` -- CLI REPL with rustyline
## Adding a New Word ## Adding a New Word
@@ -51,7 +54,7 @@ Handle in `interpret_token_immediate()` or `compile_token()` as a special case.
## Testing ## Testing
- Run `cargo test --workspace` before committing (currently 261 unit + 11 compliance tests) - Run `cargo test --workspace` before committing (currently 392 tests: 380 unit + 1 benchmark + 11 compliance)
- Forth 2012 compliance: `cargo test -p wafer-core --test compliance` - Forth 2012 compliance: `cargo test -p wafer-core --test compliance`
- Test helper in outer.rs: `eval_output("forth code")` returns printed output as String - Test helper in outer.rs: `eval_output("forth code")` returns printed output as String
- Test helper: `eval_stack("forth code")` returns data stack as Vec<i32> - Test helper: `eval_stack("forth code")` returns data stack as Vec<i32>
+7 -4
View File
@@ -6,7 +6,7 @@ An optimizing Forth 2012 compiler targeting WebAssembly.
## Status ## Status
WAFER is a working Forth system. It JIT-compiles each word definition to a separate WASM module and executes via `wasmtime`. 310 tests passing (299 unit + 11 compliance), **0 errors on all 12 tested Forth 2012 word sets** including Floating-Point. WAFER is a working Forth system with an optimizing compiler. It JIT-compiles each word definition to a separate WASM module and executes via `wasmtime`. 392 tests passing (380 unit + 1 benchmark + 11 compliance), **0 errors on all 12 tested Forth 2012 word sets** including Floating-Point.
**Working features:** **Working features:**
@@ -50,7 +50,7 @@ Forth Source -> Outer Interpreter -> IR -> [Optimize] -> WASM Codegen (wasm-enco
- **Subroutine threading** via WASM function tables and `call_indirect` - **Subroutine threading** via WASM function tables and `call_indirect`
- **JIT mode**: each new word compiles to a separate WASM module linked to shared memory/globals/table - **JIT mode**: each new word compiles to a separate WASM module linked to shared memory/globals/table
- **IR-based pipeline** enables future optimization passes before WASM emission - **IR-based pipeline** with 6 optimization passes (peephole, constant folding, strength reduction, DCE, tail call detection, inlining) plus stack-to-local promotion and consolidation
- **Dictionary**: linked-list word headers in simulated linear memory - **Dictionary**: linked-list word headers in simulated linear memory
## Building ## Building
@@ -75,12 +75,15 @@ echo ': SQUARE DUP * ; 7 SQUARE .' | cargo run -p wafer
## Testing ## Testing
```bash ```bash
# All tests (185 currently passing) # All tests (392 currently passing)
cargo test --workspace cargo test --workspace
# Forth 2012 compliance dashboard # Forth 2012 compliance dashboard
cargo test -p wafer-core --test compliance cargo test -p wafer-core --test compliance
# Optimization benchmark report
cargo test -p wafer-core --test benchmark_report -- --nocapture --ignored
# Lints # Lints
cargo clippy --workspace cargo clippy --workspace
``` ```
@@ -117,7 +120,7 @@ tests/ Forth 2012 compliance suite (gerryjackson/forth2012-test-suite sub
### Not Yet Implemented ### Not Yet Implemented
11 word sets at 100% compliance: Core, Core Ext, Core Plus, Exception, Double-Number, String, Search-Order, Memory-Allocation, Programming-Tools, Facility, Locals. 130+ words including VALUE, DEFER, CASE, DOES>, CATCH/THROW, double-cell arithmetic, string operations. 12 word sets at 100% compliance: Core, Core Ext, Core Plus, Exception, Double-Number, String, Search-Order, Memory-Allocation, Programming-Tools, Facility, Locals, Floating-Point. 200+ words including VALUE, DEFER, CASE, DOES>, CATCH/THROW, double-cell arithmetic, string operations, and 70+ floating-point words.
## Compliance Status ## Compliance Status
+14 -14
View File
@@ -31,7 +31,7 @@ This document describes every optimization that makes sense for WAFER, why it ma
## 1. Stack-to-Local Promotion ## 1. Stack-to-Local Promotion
**Status: Not implemented.** Type infrastructure exists (`crates/core/src/types.rs`) but is not wired into codegen. **Status: Phase 1 done.** Straight-line words (no control flow, calls, or I/O) use WASM locals instead of memory stack. Stack manipulation ops (Swap, Rot, Nip, Tuck, Dup, Drop) emit zero WASM instructions. Switchable via `WaferConfig::codegen.stack_to_local_promotion`.
### The Problem ### The Problem
@@ -105,7 +105,7 @@ When the compiler can statically determine the types and lifetimes of values on
## 2. Peephole Optimization ## 2. Peephole Optimization
**Status: Not implemented.** **Status: Done.** Implemented in `optimizer.rs::peephole()`. Runs as fixpoint loop.
A peephole optimizer scans adjacent IR operations and replaces recognized patterns with cheaper equivalents. This is the lowest-effort, highest-return IR pass because Forth's postfix style generates many redundant sequences. A peephole optimizer scans adjacent IR operations and replaces recognized patterns with cheaper equivalents. This is the lowest-effort, highest-return IR pass because Forth's postfix style generates many redundant sequences.
@@ -134,7 +134,7 @@ A single function `fn peephole(ops: Vec<IrOp>) -> Vec<IrOp>` that makes repeated
## 3. Constant Folding ## 3. Constant Folding
**Status: Not implemented.** **Status: Done.** Implemented in `optimizer.rs::constant_fold()`. Folds binary and unary ops on known constants.
When both operands of an operation are compile-time constants, compute the result at compile time. When both operands of an operation are compile-time constants, compute the result at compile time.
@@ -163,7 +163,7 @@ A function `fn constant_fold(ops: Vec<IrOp>) -> Vec<IrOp>` that simulates a comp
## 4. Inlining ## 4. Inlining
**Status: Not implemented.** **Status: Done.** Implemented in `optimizer.rs::inline()`. Inlines word bodies <= 8 ops, non-recursive. TailCall converted back to Call when inlining. IR bodies stored in `ForthVM::ir_bodies`.
Replace `Call(WordId)` with the callee's IR body, avoiding the `call_indirect` overhead and enabling further optimization of the combined code. Replace `Call(WordId)` with the callee's IR body, avoiding the `call_indirect` overhead and enabling further optimization of the combined code.
@@ -212,7 +212,7 @@ PushI32(34)
## 5. Strength Reduction ## 5. Strength Reduction
**Status: Not implemented.** **Status: Done.** Implemented in `optimizer.rs::strength_reduce()`. Power-of-2 multiply to shift, zero comparisons to ZeroEq/ZeroLt.
Replace expensive operations with cheaper equivalents when one operand is a known constant. Replace expensive operations with cheaper equivalents when one operand is a known constant.
@@ -231,7 +231,7 @@ The most common case is `CELLS` which is defined as `PushI32(4), Mul`. Strength
## 6. Dead Code Elimination ## 6. Dead Code Elimination
**Status: Not implemented.** **Status: Done.** Implemented in `optimizer.rs::dce()`. Truncates after Exit, eliminates constant-conditional branches.
Remove IR operations that can never execute or whose results are never used. Remove IR operations that can never execute or whose results are never used.
@@ -246,7 +246,7 @@ DCE should run after constant folding, since folding can create new constant con
## 7. Tail Call Optimization ## 7. Tail Call Optimization
**Status: Partial.** `IrOp::TailCall(WordId)` exists in `ir.rs` and codegen handles it in `codegen.rs`, but the compiler never generates it. **Status: Done.** `optimizer.rs::tail_call_detect()` converts the last `Call` to `TailCall` when return stack is balanced. Codegen emits `call_indirect + return`.
### What Exists ### What Exists
@@ -272,7 +272,7 @@ Detection rule: if the last IR op in a word body (or in a branch of an `If`) is
## 8. Consolidation ## 8. Consolidation
**Status: Not implemented.** Stub exists at `crates/core/src/consolidate.rs`. **Status: Done.** `CONSOLIDATE` word recompiles all IR-based words into a single WASM module with direct `call` instructions. Implemented in `codegen.rs::compile_consolidated_module()` and `outer.rs::consolidate()`.
### The Idea ### The Idea
@@ -300,7 +300,7 @@ After interactive development, `CONSOLIDATE` recompiles all defined words into a
## 9. Compound IR Operations ## 9. Compound IR Operations
**Status: Not implemented.** **Status: Done.** `TwoDup` and `TwoDrop` IrOp variants with optimized codegen. Peephole converts `Over, Over -> TwoDup` and `Drop, Drop -> TwoDrop`.
Some common multi-op sequences have more efficient WASM implementations than emitting each op individually. Some common multi-op sequences have more efficient WASM implementations than emitting each op individually.
@@ -342,7 +342,7 @@ These can be added as new `IrOp` variants recognized by peephole and emitted by
## 10. Codegen Improvements ## 10. Codegen Improvements
**Status: Not implemented.** **Status: Done (DSP caching).** `$dsp` cached in WASM local 0, written back before calls and at function exit. Commutative optimization and loop index in local are future work.
These are improvements within `codegen.rs` `emit_op()` that do not require new IR operations. These are improvements within `codegen.rs` `emit_op()` that do not require new IR operations.
@@ -394,7 +394,7 @@ i32.add ;; result on wasm stack
## 11. wasmtime Configuration ## 11. wasmtime Configuration
**Status: Not implemented.** Currently using `Engine::default()`. **Status: Done.** NaN canonicalization disabled, module caching enabled via `cache_config_load_default()`.
### Available Knobs ### Available Knobs
@@ -410,7 +410,7 @@ Module caching is the most impactful: `wasmtime::Config::cache_config_load_defau
## 12. Dictionary Hash Index ## 12. Dictionary Hash Index
**Status: Not implemented.** **Status: Done.** `HashMap<String, (u32, u32, bool)>` in Dictionary struct. `find()` uses O(1) hash lookup with linked-list fallback. Updated on `reveal()` and `toggle_immediate()`.
The dictionary lookup (`dictionary.rs` `find()`) walks a linked list from the most recent entry backward, comparing names character by character. After registering 80+ primitives plus user words, every lookup during compilation scans the full list. The dictionary lookup (`dictionary.rs` `find()`) walks a linked list from the most recent entry backward, comparing names character by character. After registering 80+ primitives plus user words, every lookup during compilation scans the full list.
@@ -422,7 +422,7 @@ This affects **compile time** (word lookup during parsing), not runtime (compile
## 13. Startup Batching ## 13. Startup Batching
**Status: Not implemented.** `compile_core_module()` stub exists in `codegen.rs`. **Status: Not started.** `compile_core_module()` stub exists in `codegen.rs`.
Currently, each of the 80+ primitives registered at boot creates a separate WASM module: `wasm-encoder` builds it, `wasmparser` validates it, Cranelift compiles it, and wasmtime instantiates it. This happens 80+ times sequentially. Currently, each of the 80+ primitives registered at boot creates a separate WASM module: `wasm-encoder` builds it, `wasmparser` validates it, Cranelift compiles it, and wasmtime instantiates it. This happens 80+ times sequentially.
@@ -432,7 +432,7 @@ Batch all IR-based primitives into a single WASM module with multiple exported f
## 14. Float and Double-Cell Stack ## 14. Float and Double-Cell Stack
**Status: Not implemented.** `PushI64` and `PushF64` exist as IR ops but are stubs in codegen. **Status: Not started.** `PushI64` and `PushF64` exist as IR ops but are stubs in codegen. Float stack operations are currently all host functions.
The float stack lives in its own memory region (0x2540--0x2D40). Float operations will have the same memory-based overhead as integer operations, but worse: `f64` values are 8 bytes, doubling the memory traffic per push/pop. Stack-to-local promotion (section 1) is even more impactful for floats because WASM has native `f64` locals and operand stack support. The float stack lives in its own memory region (0x2540--0x2D40). Float operations will have the same memory-based overhead as integer operations, but worse: `f64` values are 8 bytes, doubling the memory traffic per push/pop. Stack-to-local promotion (section 1) is even more impactful for floats because WASM has native `f64` locals and operand stack support.
+8 -3
View File
@@ -8,10 +8,14 @@ WAFER (WebAssembly Forth Engine in Rust) is a Forth 2012 compiler that JIT-compi
crates/ crates/
core/src/ core/src/
outer.rs ForthVM: outer interpreter, compiler, all primitives outer.rs ForthVM: outer interpreter, compiler, all primitives
codegen.rs IR-to-WASM translation, module generation codegen.rs IR-to-WASM translation, module generation, stack-to-local promotion
dictionary.rs Dictionary (linked list in Vec<u8>) dictionary.rs Dictionary (linked list in Vec<u8>, hash index for O(1) lookup)
ir.rs IrOp enum -- the intermediate representation ir.rs IrOp enum -- the intermediate representation
optimizer.rs IR optimization passes (peephole, fold, inline, DCE, etc.)
config.rs WaferConfig: unified optimization configuration
consolidate.rs Consolidation recompiler (single-module direct calls)
memory.rs Memory layout constants (addresses, sizes) memory.rs Memory layout constants (addresses, sizes)
types.rs Stack type inference infrastructure
error.rs Error types error.rs Error types
cli/src/ cli/src/
main.rs CLI REPL (rustyline), file execution main.rs CLI REPL (rustyline), file execution
@@ -23,7 +27,7 @@ tests/
forth2012-test-suite/ Forth 2012 compliance test suite (submodule) forth2012-test-suite/ Forth 2012 compliance test suite (submodule)
``` ```
The entire compiler and runtime lives in `outer.rs` (~5200 lines). Codegen is in `codegen.rs` (~1500 lines). Everything else is supporting infrastructure. The compiler and runtime lives in `outer.rs` (~10,400 lines). Codegen is in `codegen.rs` (~2,800 lines). The optimizer is in `optimizer.rs` (~800 lines). Everything else is supporting infrastructure.
## What Happens When You Start WAFER ## What Happens When You Start WAFER
@@ -39,6 +43,7 @@ wasmtime Store Runtime state container
Linear Memory 16 pages (1 MiB), expandable to 256 pages (16 MiB) Linear Memory 16 pages (1 MiB), expandable to 256 pages (16 MiB)
Global: DSP Data stack pointer, initialized to 0x1540 Global: DSP Data stack pointer, initialized to 0x1540
Global: RSP Return stack pointer, initialized to 0x2540 Global: RSP Return stack pointer, initialized to 0x2540
Global: FSP Float stack pointer, initialized to 0x2D40
Function Table 256 funcref entries (grows as needed) Function Table 256 funcref entries (grows as needed)
``` ```