Update all docs to reflect current state
README: 392 tests, 200+ words, 12 word sets, optimization pipeline described CLAUDE.md: 200+ words, 12 word sets, 392 tests, added optimizer/config/consolidate to key files OPTIMIZATIONS.md: update all 14 section statuses (12 done, 2 not started) WAFER.md: correct line counts, add optimizer/config/consolidate/types to project layout, add FSP global
This commit is contained in:
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
## What is WAFER?
|
## What is WAFER?
|
||||||
|
|
||||||
WAFER (WebAssembly Forth Engine in Rust) is an optimizing Forth 2012 compiler targeting WebAssembly. Currently a working Forth system with 130+ words, JIT compilation, and 11 word sets at 100% compliance.
|
WAFER (WebAssembly Forth Engine in Rust) is an optimizing Forth 2012 compiler targeting WebAssembly. Currently a working Forth system with 200+ words, JIT compilation, 12 word sets at 100% compliance, and a full optimization pipeline (peephole, constant folding, inlining, strength reduction, DCE, tail calls, stack-to-local promotion, consolidation).
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
@@ -19,6 +19,9 @@ WAFER (WebAssembly Forth Engine in Rust) is an optimizing Forth 2012 compiler ta
|
|||||||
- `crates/core/src/dictionary.rs` -- Dictionary data structure with create/find/reveal
|
- `crates/core/src/dictionary.rs` -- Dictionary data structure with create/find/reveal
|
||||||
- `crates/core/src/ir.rs` -- IrOp enum (the intermediate representation)
|
- `crates/core/src/ir.rs` -- IrOp enum (the intermediate representation)
|
||||||
- `crates/core/src/memory.rs` -- Memory layout constants (stack regions, dictionary base, etc.)
|
- `crates/core/src/memory.rs` -- Memory layout constants (stack regions, dictionary base, etc.)
|
||||||
|
- `crates/core/src/optimizer.rs` -- IR optimization passes (peephole, fold, inline, DCE, etc.)
|
||||||
|
- `crates/core/src/config.rs` -- WaferConfig: unified optimization configuration
|
||||||
|
- `crates/core/src/consolidate.rs` -- Consolidation recompiler (single-module direct calls)
|
||||||
- `crates/cli/src/main.rs` -- CLI REPL with rustyline
|
- `crates/cli/src/main.rs` -- CLI REPL with rustyline
|
||||||
|
|
||||||
## Adding a New Word
|
## Adding a New Word
|
||||||
@@ -51,7 +54,7 @@ Handle in `interpret_token_immediate()` or `compile_token()` as a special case.
|
|||||||
|
|
||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
- Run `cargo test --workspace` before committing (currently 261 unit + 11 compliance tests)
|
- Run `cargo test --workspace` before committing (currently 392 tests: 380 unit + 1 benchmark + 11 compliance)
|
||||||
- Forth 2012 compliance: `cargo test -p wafer-core --test compliance`
|
- Forth 2012 compliance: `cargo test -p wafer-core --test compliance`
|
||||||
- Test helper in outer.rs: `eval_output("forth code")` returns printed output as String
|
- Test helper in outer.rs: `eval_output("forth code")` returns printed output as String
|
||||||
- Test helper: `eval_stack("forth code")` returns data stack as Vec<i32>
|
- Test helper: `eval_stack("forth code")` returns data stack as Vec<i32>
|
||||||
|
|||||||
@@ -6,7 +6,7 @@ An optimizing Forth 2012 compiler targeting WebAssembly.
|
|||||||
|
|
||||||
## Status
|
## Status
|
||||||
|
|
||||||
WAFER is a working Forth system. It JIT-compiles each word definition to a separate WASM module and executes via `wasmtime`. 310 tests passing (299 unit + 11 compliance), **0 errors on all 12 tested Forth 2012 word sets** including Floating-Point.
|
WAFER is a working Forth system with an optimizing compiler. It JIT-compiles each word definition to a separate WASM module and executes via `wasmtime`. 392 tests passing (380 unit + 1 benchmark + 11 compliance), **0 errors on all 12 tested Forth 2012 word sets** including Floating-Point.
|
||||||
|
|
||||||
**Working features:**
|
**Working features:**
|
||||||
|
|
||||||
@@ -50,7 +50,7 @@ Forth Source -> Outer Interpreter -> IR -> [Optimize] -> WASM Codegen (wasm-enco
|
|||||||
|
|
||||||
- **Subroutine threading** via WASM function tables and `call_indirect`
|
- **Subroutine threading** via WASM function tables and `call_indirect`
|
||||||
- **JIT mode**: each new word compiles to a separate WASM module linked to shared memory/globals/table
|
- **JIT mode**: each new word compiles to a separate WASM module linked to shared memory/globals/table
|
||||||
- **IR-based pipeline** enables future optimization passes before WASM emission
|
- **IR-based pipeline** with 6 optimization passes (peephole, constant folding, strength reduction, DCE, tail call detection, inlining) plus stack-to-local promotion and consolidation
|
||||||
- **Dictionary**: linked-list word headers in simulated linear memory
|
- **Dictionary**: linked-list word headers in simulated linear memory
|
||||||
|
|
||||||
## Building
|
## Building
|
||||||
@@ -75,12 +75,15 @@ echo ': SQUARE DUP * ; 7 SQUARE .' | cargo run -p wafer
|
|||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# All tests (185 currently passing)
|
# All tests (392 currently passing)
|
||||||
cargo test --workspace
|
cargo test --workspace
|
||||||
|
|
||||||
# Forth 2012 compliance dashboard
|
# Forth 2012 compliance dashboard
|
||||||
cargo test -p wafer-core --test compliance
|
cargo test -p wafer-core --test compliance
|
||||||
|
|
||||||
|
# Optimization benchmark report
|
||||||
|
cargo test -p wafer-core --test benchmark_report -- --nocapture --ignored
|
||||||
|
|
||||||
# Lints
|
# Lints
|
||||||
cargo clippy --workspace
|
cargo clippy --workspace
|
||||||
```
|
```
|
||||||
@@ -117,7 +120,7 @@ tests/ Forth 2012 compliance suite (gerryjackson/forth2012-test-suite sub
|
|||||||
|
|
||||||
### Not Yet Implemented
|
### Not Yet Implemented
|
||||||
|
|
||||||
11 word sets at 100% compliance: Core, Core Ext, Core Plus, Exception, Double-Number, String, Search-Order, Memory-Allocation, Programming-Tools, Facility, Locals. 130+ words including VALUE, DEFER, CASE, DOES>, CATCH/THROW, double-cell arithmetic, string operations.
|
12 word sets at 100% compliance: Core, Core Ext, Core Plus, Exception, Double-Number, String, Search-Order, Memory-Allocation, Programming-Tools, Facility, Locals, Floating-Point. 200+ words including VALUE, DEFER, CASE, DOES>, CATCH/THROW, double-cell arithmetic, string operations, and 70+ floating-point words.
|
||||||
|
|
||||||
## Compliance Status
|
## Compliance Status
|
||||||
|
|
||||||
|
|||||||
+14
-14
@@ -31,7 +31,7 @@ This document describes every optimization that makes sense for WAFER, why it ma
|
|||||||
|
|
||||||
## 1. Stack-to-Local Promotion
|
## 1. Stack-to-Local Promotion
|
||||||
|
|
||||||
**Status: Not implemented.** Type infrastructure exists (`crates/core/src/types.rs`) but is not wired into codegen.
|
**Status: Phase 1 done.** Straight-line words (no control flow, calls, or I/O) use WASM locals instead of memory stack. Stack manipulation ops (Swap, Rot, Nip, Tuck, Dup, Drop) emit zero WASM instructions. Switchable via `WaferConfig::codegen.stack_to_local_promotion`.
|
||||||
|
|
||||||
### The Problem
|
### The Problem
|
||||||
|
|
||||||
@@ -105,7 +105,7 @@ When the compiler can statically determine the types and lifetimes of values on
|
|||||||
|
|
||||||
## 2. Peephole Optimization
|
## 2. Peephole Optimization
|
||||||
|
|
||||||
**Status: Not implemented.**
|
**Status: Done.** Implemented in `optimizer.rs::peephole()`. Runs as fixpoint loop.
|
||||||
|
|
||||||
A peephole optimizer scans adjacent IR operations and replaces recognized patterns with cheaper equivalents. This is the lowest-effort, highest-return IR pass because Forth's postfix style generates many redundant sequences.
|
A peephole optimizer scans adjacent IR operations and replaces recognized patterns with cheaper equivalents. This is the lowest-effort, highest-return IR pass because Forth's postfix style generates many redundant sequences.
|
||||||
|
|
||||||
@@ -134,7 +134,7 @@ A single function `fn peephole(ops: Vec<IrOp>) -> Vec<IrOp>` that makes repeated
|
|||||||
|
|
||||||
## 3. Constant Folding
|
## 3. Constant Folding
|
||||||
|
|
||||||
**Status: Not implemented.**
|
**Status: Done.** Implemented in `optimizer.rs::constant_fold()`. Folds binary and unary ops on known constants.
|
||||||
|
|
||||||
When both operands of an operation are compile-time constants, compute the result at compile time.
|
When both operands of an operation are compile-time constants, compute the result at compile time.
|
||||||
|
|
||||||
@@ -163,7 +163,7 @@ A function `fn constant_fold(ops: Vec<IrOp>) -> Vec<IrOp>` that simulates a comp
|
|||||||
|
|
||||||
## 4. Inlining
|
## 4. Inlining
|
||||||
|
|
||||||
**Status: Not implemented.**
|
**Status: Done.** Implemented in `optimizer.rs::inline()`. Inlines word bodies <= 8 ops, non-recursive. TailCall converted back to Call when inlining. IR bodies stored in `ForthVM::ir_bodies`.
|
||||||
|
|
||||||
Replace `Call(WordId)` with the callee's IR body, avoiding the `call_indirect` overhead and enabling further optimization of the combined code.
|
Replace `Call(WordId)` with the callee's IR body, avoiding the `call_indirect` overhead and enabling further optimization of the combined code.
|
||||||
|
|
||||||
@@ -212,7 +212,7 @@ PushI32(34)
|
|||||||
|
|
||||||
## 5. Strength Reduction
|
## 5. Strength Reduction
|
||||||
|
|
||||||
**Status: Not implemented.**
|
**Status: Done.** Implemented in `optimizer.rs::strength_reduce()`. Power-of-2 multiply to shift, zero comparisons to ZeroEq/ZeroLt.
|
||||||
|
|
||||||
Replace expensive operations with cheaper equivalents when one operand is a known constant.
|
Replace expensive operations with cheaper equivalents when one operand is a known constant.
|
||||||
|
|
||||||
@@ -231,7 +231,7 @@ The most common case is `CELLS` which is defined as `PushI32(4), Mul`. Strength
|
|||||||
|
|
||||||
## 6. Dead Code Elimination
|
## 6. Dead Code Elimination
|
||||||
|
|
||||||
**Status: Not implemented.**
|
**Status: Done.** Implemented in `optimizer.rs::dce()`. Truncates after Exit, eliminates constant-conditional branches.
|
||||||
|
|
||||||
Remove IR operations that can never execute or whose results are never used.
|
Remove IR operations that can never execute or whose results are never used.
|
||||||
|
|
||||||
@@ -246,7 +246,7 @@ DCE should run after constant folding, since folding can create new constant con
|
|||||||
|
|
||||||
## 7. Tail Call Optimization
|
## 7. Tail Call Optimization
|
||||||
|
|
||||||
**Status: Partial.** `IrOp::TailCall(WordId)` exists in `ir.rs` and codegen handles it in `codegen.rs`, but the compiler never generates it.
|
**Status: Done.** `optimizer.rs::tail_call_detect()` converts the last `Call` to `TailCall` when return stack is balanced. Codegen emits `call_indirect + return`.
|
||||||
|
|
||||||
### What Exists
|
### What Exists
|
||||||
|
|
||||||
@@ -272,7 +272,7 @@ Detection rule: if the last IR op in a word body (or in a branch of an `If`) is
|
|||||||
|
|
||||||
## 8. Consolidation
|
## 8. Consolidation
|
||||||
|
|
||||||
**Status: Not implemented.** Stub exists at `crates/core/src/consolidate.rs`.
|
**Status: Done.** `CONSOLIDATE` word recompiles all IR-based words into a single WASM module with direct `call` instructions. Implemented in `codegen.rs::compile_consolidated_module()` and `outer.rs::consolidate()`.
|
||||||
|
|
||||||
### The Idea
|
### The Idea
|
||||||
|
|
||||||
@@ -300,7 +300,7 @@ After interactive development, `CONSOLIDATE` recompiles all defined words into a
|
|||||||
|
|
||||||
## 9. Compound IR Operations
|
## 9. Compound IR Operations
|
||||||
|
|
||||||
**Status: Not implemented.**
|
**Status: Done.** `TwoDup` and `TwoDrop` IrOp variants with optimized codegen. Peephole converts `Over, Over -> TwoDup` and `Drop, Drop -> TwoDrop`.
|
||||||
|
|
||||||
Some common multi-op sequences have more efficient WASM implementations than emitting each op individually.
|
Some common multi-op sequences have more efficient WASM implementations than emitting each op individually.
|
||||||
|
|
||||||
@@ -342,7 +342,7 @@ These can be added as new `IrOp` variants recognized by peephole and emitted by
|
|||||||
|
|
||||||
## 10. Codegen Improvements
|
## 10. Codegen Improvements
|
||||||
|
|
||||||
**Status: Not implemented.**
|
**Status: Done (DSP caching).** `$dsp` cached in WASM local 0, written back before calls and at function exit. Commutative optimization and loop index in local are future work.
|
||||||
|
|
||||||
These are improvements within `codegen.rs` `emit_op()` that do not require new IR operations.
|
These are improvements within `codegen.rs` `emit_op()` that do not require new IR operations.
|
||||||
|
|
||||||
@@ -394,7 +394,7 @@ i32.add ;; result on wasm stack
|
|||||||
|
|
||||||
## 11. wasmtime Configuration
|
## 11. wasmtime Configuration
|
||||||
|
|
||||||
**Status: Not implemented.** Currently using `Engine::default()`.
|
**Status: Done.** NaN canonicalization disabled, module caching enabled via `cache_config_load_default()`.
|
||||||
|
|
||||||
### Available Knobs
|
### Available Knobs
|
||||||
|
|
||||||
@@ -410,7 +410,7 @@ Module caching is the most impactful: `wasmtime::Config::cache_config_load_defau
|
|||||||
|
|
||||||
## 12. Dictionary Hash Index
|
## 12. Dictionary Hash Index
|
||||||
|
|
||||||
**Status: Not implemented.**
|
**Status: Done.** `HashMap<String, (u32, u32, bool)>` in Dictionary struct. `find()` uses O(1) hash lookup with linked-list fallback. Updated on `reveal()` and `toggle_immediate()`.
|
||||||
|
|
||||||
The dictionary lookup (`dictionary.rs` `find()`) walks a linked list from the most recent entry backward, comparing names character by character. After registering 80+ primitives plus user words, every lookup during compilation scans the full list.
|
The dictionary lookup (`dictionary.rs` `find()`) walks a linked list from the most recent entry backward, comparing names character by character. After registering 80+ primitives plus user words, every lookup during compilation scans the full list.
|
||||||
|
|
||||||
@@ -422,7 +422,7 @@ This affects **compile time** (word lookup during parsing), not runtime (compile
|
|||||||
|
|
||||||
## 13. Startup Batching
|
## 13. Startup Batching
|
||||||
|
|
||||||
**Status: Not implemented.** `compile_core_module()` stub exists in `codegen.rs`.
|
**Status: Not started.** `compile_core_module()` stub exists in `codegen.rs`.
|
||||||
|
|
||||||
Currently, each of the 80+ primitives registered at boot creates a separate WASM module: `wasm-encoder` builds it, `wasmparser` validates it, Cranelift compiles it, and wasmtime instantiates it. This happens 80+ times sequentially.
|
Currently, each of the 80+ primitives registered at boot creates a separate WASM module: `wasm-encoder` builds it, `wasmparser` validates it, Cranelift compiles it, and wasmtime instantiates it. This happens 80+ times sequentially.
|
||||||
|
|
||||||
@@ -432,7 +432,7 @@ Batch all IR-based primitives into a single WASM module with multiple exported f
|
|||||||
|
|
||||||
## 14. Float and Double-Cell Stack
|
## 14. Float and Double-Cell Stack
|
||||||
|
|
||||||
**Status: Not implemented.** `PushI64` and `PushF64` exist as IR ops but are stubs in codegen.
|
**Status: Not started.** `PushI64` and `PushF64` exist as IR ops but are stubs in codegen. Float stack operations are currently all host functions.
|
||||||
|
|
||||||
The float stack lives in its own memory region (0x2540--0x2D40). Float operations will have the same memory-based overhead as integer operations, but worse: `f64` values are 8 bytes, doubling the memory traffic per push/pop. Stack-to-local promotion (section 1) is even more impactful for floats because WASM has native `f64` locals and operand stack support.
|
The float stack lives in its own memory region (0x2540--0x2D40). Float operations will have the same memory-based overhead as integer operations, but worse: `f64` values are 8 bytes, doubling the memory traffic per push/pop. Stack-to-local promotion (section 1) is even more impactful for floats because WASM has native `f64` locals and operand stack support.
|
||||||
|
|
||||||
|
|||||||
+8
-3
@@ -8,10 +8,14 @@ WAFER (WebAssembly Forth Engine in Rust) is a Forth 2012 compiler that JIT-compi
|
|||||||
crates/
|
crates/
|
||||||
core/src/
|
core/src/
|
||||||
outer.rs ForthVM: outer interpreter, compiler, all primitives
|
outer.rs ForthVM: outer interpreter, compiler, all primitives
|
||||||
codegen.rs IR-to-WASM translation, module generation
|
codegen.rs IR-to-WASM translation, module generation, stack-to-local promotion
|
||||||
dictionary.rs Dictionary (linked list in Vec<u8>)
|
dictionary.rs Dictionary (linked list in Vec<u8>, hash index for O(1) lookup)
|
||||||
ir.rs IrOp enum -- the intermediate representation
|
ir.rs IrOp enum -- the intermediate representation
|
||||||
|
optimizer.rs IR optimization passes (peephole, fold, inline, DCE, etc.)
|
||||||
|
config.rs WaferConfig: unified optimization configuration
|
||||||
|
consolidate.rs Consolidation recompiler (single-module direct calls)
|
||||||
memory.rs Memory layout constants (addresses, sizes)
|
memory.rs Memory layout constants (addresses, sizes)
|
||||||
|
types.rs Stack type inference infrastructure
|
||||||
error.rs Error types
|
error.rs Error types
|
||||||
cli/src/
|
cli/src/
|
||||||
main.rs CLI REPL (rustyline), file execution
|
main.rs CLI REPL (rustyline), file execution
|
||||||
@@ -23,7 +27,7 @@ tests/
|
|||||||
forth2012-test-suite/ Forth 2012 compliance test suite (submodule)
|
forth2012-test-suite/ Forth 2012 compliance test suite (submodule)
|
||||||
```
|
```
|
||||||
|
|
||||||
The entire compiler and runtime lives in `outer.rs` (~5200 lines). Codegen is in `codegen.rs` (~1500 lines). Everything else is supporting infrastructure.
|
The compiler and runtime lives in `outer.rs` (~10,400 lines). Codegen is in `codegen.rs` (~2,800 lines). The optimizer is in `optimizer.rs` (~800 lines). Everything else is supporting infrastructure.
|
||||||
|
|
||||||
## What Happens When You Start WAFER
|
## What Happens When You Start WAFER
|
||||||
|
|
||||||
@@ -39,6 +43,7 @@ wasmtime Store Runtime state container
|
|||||||
Linear Memory 16 pages (1 MiB), expandable to 256 pages (16 MiB)
|
Linear Memory 16 pages (1 MiB), expandable to 256 pages (16 MiB)
|
||||||
Global: DSP Data stack pointer, initialized to 0x1540
|
Global: DSP Data stack pointer, initialized to 0x1540
|
||||||
Global: RSP Return stack pointer, initialized to 0x2540
|
Global: RSP Return stack pointer, initialized to 0x2540
|
||||||
|
Global: FSP Float stack pointer, initialized to 0x2D40
|
||||||
Function Table 256 funcref entries (grows as needed)
|
Function Table 256 funcref entries (grows as needed)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user