Update docs: performance results, new optimizations, test counts

- README: add performance section (beats gforth 2-10x), update test commands, note self-recursive direct calls and loop promotion - CLAUDE.md: update test counts (427 unit + comparison tests) - OPTIMIZATIONS.md: stack-to-local Phase 1→Phase 2 (loops + IF), DO/LOOP locals done, J as IR done, add section 14 (self-recursive direct call), add current performance table vs gforth - WAFER.md: document self-recursive call optimization, CONSOLIDATE, update test commands and line counts - FORTH.md: expanded space history, add FORTH-IN-SPACE.md reference - FORTH-IN-SPACE.md: new document with verified spacecraft history
2026-04-09 20:00:55 +02:00
parent 7344d3a8d7
commit 08b2eced2d
6 changed files with 176 additions and 50 deletions
@@ -7,8 +7,10 @@ An optimizing Forth 2012 compiler targeting WebAssembly. WAFER JIT-compiles each
 ## Highlights

 - **200+ words** across 12 Forth 2012 word sets, all at **100% compliance**
- **Optimizing compiler** with 6 IR passes + stack-to-local promotion + consolidation
+- **Optimizing compiler** with 6 IR passes + stack-to-local promotion (loops + IF) + consolidation
+- **Faster than gforth** on all benchmarks in release mode (2-10x faster)
 - **JIT compilation** — each `:` definition compiles to its own WASM module
+- **Self-recursive direct calls** — RECURSE compiles to native `call` instead of `call_indirect`
 - **Consolidation mode** — recompile all words into a single optimized WASM module
 - **Interactive REPL** with line editing (rustyline)

@@ -73,16 +75,34 @@ If you already cloned without `--recurse-submodules`, fetch the Forth 2012 test
 git submodule update --init
 ```

+## Performance
+
+WAFER beats gforth (the GNU Forth reference implementation) on all benchmarks in release mode:
+
+```
+Benchmark                   WAFER     CONSOL     gforth      WAFER/gf
+Fibonacci(25)                1629       1535       3422        0.45x
+Factorial(12)x10K             340        339        638        0.53x
+GCD-bench(500)                 18         15         30        0.50x
+NestedLoops(50)                84         73        720        0.10x
+Collatz(2K)                  1212       1202       3914        0.31x
+```
+
+Times in microseconds. WAFER/gf < 1.0 means WAFER is faster. CONSOL = after `CONSOLIDATE`.
+
 ## Testing

 ```bash
-# All tests (392 currently passing)
+# All tests (~450 currently passing)
 cargo test --workspace

 # Forth 2012 compliance suite
 cargo test -p wafer-core --test compliance

-# Optimization benchmark report
+# Cross-engine comparison (WAFER vs gforth, requires gforth)
+cargo test -p wafer-core --test comparison -- --nocapture --ignored
+
+# Optimization benchmark report (WAFER-internal)
 cargo test -p wafer-core --test benchmark_report -- --nocapture --ignored

 # Lints
@@ -98,9 +118,9 @@ Forth Source -> Outer Interpreter -> IR -> [Optimize] -> WASM Codegen (wasm-enco
                                                    (shared memory + table)
 ```

- **Subroutine threading** via WASM function tables and `call_indirect`
+- **Subroutine threading** via WASM function tables (`call_indirect` for cross-word, direct `call` for self-recursion)
 - **JIT mode**: each new word compiles to a separate WASM module linked to shared memory/globals/table
- **IR-based pipeline** with 6 optimization passes (peephole, constant folding, strength reduction, DCE, tail call detection, inlining) plus stack-to-local promotion and consolidation
+- **IR-based pipeline** with 6 optimization passes (peephole, constant folding, strength reduction, DCE, tail call detection, inlining) plus stack-to-local promotion (with loop and IF/ELSE support), DO/LOOP index locals, and consolidation
 - **Dictionary**: linked-list word headers in simulated linear memory

 ## Project Structure