diff --git a/README.md b/README.md index 7a7e774..9b2b453 100644 --- a/README.md +++ b/README.md @@ -2,24 +2,52 @@ **WebAssembly Forth Engine in Rust** -An optimizing Forth 2012 compiler targeting WebAssembly. +An optimizing Forth 2012 compiler targeting WebAssembly. WAFER JIT-compiles each word definition to a separate WASM module and executes it via [wasmtime](https://wasmtime.dev/). -## Status +## Highlights -WAFER is a working Forth system with an optimizing compiler. It JIT-compiles each word definition to a separate WASM module and executes via `wasmtime`. 392 tests passing (380 unit + 1 benchmark + 11 compliance), **0 errors on all 12 tested Forth 2012 word sets** including Floating-Point. +- **200+ words** across 12 Forth 2012 word sets, all at **100% compliance** +- **Optimizing compiler** with 6 IR passes + stack-to-local promotion + consolidation +- **JIT compilation** — each `:` definition compiles to its own WASM module +- **Consolidation mode** — recompile all words into a single optimized WASM module +- **Interactive REPL** with line editing (rustyline) -**Working features:** +## Installation -- Colon definitions with full control flow (IF/ELSE/THEN, DO/LOOP/+LOOP, BEGIN/UNTIL, BEGIN/WHILE/REPEAT) -- 200+ words: stack, arithmetic, comparison, logic, memory, I/O, defining words, system, exceptions, double-cell, strings, floating-point (70+ float words) -- Recursion (RECURSE), nested control structures, loop counters (I, J) -- VARIABLE, CONSTANT, CREATE, DOES> -- Number bases (HEX, DECIMAL), number prefixes ($hex, #dec, %bin) -- Pictured numeric output (<# # #S #> HOLD SIGN) -- Comments (backslash, parentheses), string output (." ...) -- Interactive REPL with line editing +Requires [Rust](https://www.rust-lang.org/tools/install) 1.85+ (edition 2024). -**Example session:** +```bash +cargo install --git https://github.com/ok2/wafer.git wafer +``` + +This installs the `wafer` binary to `~/.cargo/bin/`. + +To install from a local checkout: + +```bash +cargo install --path crates/cli +``` + +## Usage + +```bash +# Interactive REPL (type BYE to exit) +wafer + +# Run a Forth file +wafer program.fth + +# Pipe input +echo ': SQUARE DUP * ; 7 SQUARE .' | wafer + +# Consolidation: recompile all words into a single optimized WASM module +wafer --consolidate program.fth + +# Consolidation with WASM output +wafer --consolidate -o output.wasm program.fth +``` + +**Example REPL session:** ```forth : FIB DUP 2 < IF DROP 1 ELSE DUP 1 - RECURSE SWAP 2 - RECURSE + THEN ; @@ -31,13 +59,35 @@ VARIABLE COUNTER 0 COUNTER ! BUMP BUMP BUMP COUNTER @ . \ prints: 3 ``` -## Goals +## Building from source -- **Full Forth 2012 compliance** -- all word sets, 100% test suite pass rate -- **Optimizing compiler** -- constant folding, inlining, peephole optimization, stack-to-local promotion -- **Multi-typed stack** -- type inference uses WASM's native typed stack when possible -- **Self-hosting** -- minimal Rust kernel (~35 primitives), everything else in WAFER Forth -- **Consolidation mode** -- recompile all JIT words into a single optimized WASM module +```bash +git clone --recurse-submodules https://github.com/ok2/wafer.git +cd wafer +cargo build --workspace --release +``` + +If you already cloned without `--recurse-submodules`, fetch the Forth 2012 test suite with: + +```bash +git submodule update --init +``` + +## Testing + +```bash +# All tests (392 currently passing) +cargo test --workspace + +# Forth 2012 compliance suite +cargo test -p wafer-core --test compliance + +# Optimization benchmark report +cargo test -p wafer-core --test benchmark_report -- --nocapture --ignored + +# Lints +cargo clippy --workspace +``` ## Architecture @@ -53,55 +103,40 @@ Forth Source -> Outer Interpreter -> IR -> [Optimize] -> WASM Codegen (wasm-enco - **IR-based pipeline** with 6 optimization passes (peephole, constant folding, strength reduction, DCE, tail call detection, inlining) plus stack-to-local promotion and consolidation - **Dictionary**: linked-list word headers in simulated linear memory -## Building - -```bash -cargo build --workspace -``` - -## Running - -```bash -# Interactive REPL -cargo run -p wafer - -# Run a Forth file -cargo run -p wafer -- file.fth - -# Pipe input -echo ': SQUARE DUP * ; 7 SQUARE .' | cargo run -p wafer -``` - -## Testing - -```bash -# All tests (392 currently passing) -cargo test --workspace - -# Forth 2012 compliance dashboard -cargo test -p wafer-core --test compliance - -# Optimization benchmark report -cargo test -p wafer-core --test benchmark_report -- --nocapture --ignored - -# Lints -cargo clippy --workspace -``` - ## Project Structure ``` crates/ - core/ wafer-core: dictionary, IR, codegen (wasm-encoder), outer interpreter - cli/ wafer: CLI REPL and file execution (wasmtime, rustyline) + core/ wafer-core: dictionary, IR, codegen, optimizer, outer interpreter + cli/ wafer: CLI REPL, file execution, consolidation web/ wafer-web: browser bindings (planned) -forth/ Standard library in WAFER Forth (planned, currently stubs) -tests/ Forth 2012 compliance suite (gerryjackson/forth2012-test-suite submodule) +forth/ Bootstrap definitions loaded at startup +tests/ Forth 2012 compliance suite (git submodule) ``` +## Forth 2012 Compliance + +Tested against [Gerry Jackson's Forth 2012 test suite](https://github.com/gerryjackson/forth2012-test-suite). 12 of 14 word sets pass at 100%. + +| Word Set | Status | +| ------------------ | --------------------------------- | +| Core | **100%** (0 errors) | +| Core Extensions | **100%** (0 errors) | +| Double-Number | **100%** (0 errors) | +| Exception | **100%** (0 errors) | +| Facility | **100%** (0 errors) | +| Floating-Point | **100%** (0 errors) | +| Locals | **100%** (0 errors) | +| Memory-Allocation | **100%** (0 errors) | +| Programming-Tools | **100%** (0 errors) | +| Search-Order | **100%** (0 errors) | +| String | **100%** (0 errors) | +| File-Access | Not started (requires WASI integration) | +| Extended-Character | Not started | + ## Implemented Words -### Core (Forth 2012 Section 6.1) -- In Progress +Over 200 words are implemented across the following categories: | Category | Words | | ------------ | --------------------------------------------------------------------------------------------------------------- | @@ -111,36 +146,24 @@ tests/ Forth 2012 compliance suite (gerryjackson/forth2012-test-suite sub | Logic | `AND OR XOR INVERT LSHIFT RSHIFT` | | Memory | `@ ! C@ C! +! 2@ 2! HERE ALLOT , C, CELLS CELL+ CHARS CHAR+ ALIGNED ALIGN MOVE FILL CMOVE CMOVE>` | | Control | `IF ELSE THEN DO LOOP +LOOP I J UNLOOP LEAVE BEGIN UNTIL WHILE REPEAT RECURSE EXIT` | -| Defining | `: ; VARIABLE CONSTANT CREATE DOES> IMMEDIATE` | +| Defining | `: ; VARIABLE CONSTANT VALUE CREATE DOES> IMMEDIATE DEFER` | | I/O | `. U. .S CR EMIT SPACE SPACES TYPE ." S" ACCEPT` | | Return stack | `>R R> R@` | | System | `EXECUTE ' CHAR [CHAR] ['] DECIMAL HEX BASE STATE >IN >BODY ENVIRONMENT? SOURCE ABORT TRUE FALSE BL` | | Compiler | `LITERAL POSTPONE [ ] EVALUATE ABORT"` | | Parsing | `WORD FIND COUNT >NUMBER` | +| Exceptions | `CATCH THROW` | +| Double-cell | `D+ D- D. D.R DNEGATE DABS D= D< D0= D0< D>S 2CONSTANT 2VARIABLE 2LITERAL M+ M*/` | +| Strings | `COMPARE SEARCH SLITERAL REPLACES SUBSTITUTE UNESCAPE` | +| Floating-Pt | `F+ F- F* F/ FABS FNEGATE FSQRT FSIN FCOS FTAN FEXP FLOG FMIN FMAX` and 55+ more | +| Case | `CASE OF ENDOF ENDCASE` | -### Not Yet Implemented +## Roadmap -12 word sets at 100% compliance: Core, Core Ext, Core Plus, Exception, Double-Number, String, Search-Order, Memory-Allocation, Programming-Tools, Facility, Locals, Floating-Point. 200+ words including VALUE, DEFER, CASE, DOES>, CATCH/THROW, double-cell arithmetic, string operations, and 70+ floating-point words. - -## Compliance Status - -Targeting 100% Forth 2012 compliance via [Gerry Jackson's test suite](https://github.com/gerryjackson/forth2012-test-suite). - -| Word Set | Status | -| ------------------ | --------------------------------- | -| Core | **100%** (0 errors on test suite) | -| Core Extensions | **100%** (0 errors on test suite) | -| Double-Number | **100%** (0 errors on test suite) | -| Exception | **100%** (0 errors on test suite) | -| Facility | **100%** (0 errors on test suite) | -| File-Access | Pending (requires WASI) | -| Floating-Point | **100%** (0 errors on ak-fp-test) | -| Locals | **100%** (0 errors on test suite) | -| Memory-Allocation | **100%** (0 errors on test suite) | -| Programming-Tools | **100%** (0 errors on test suite) | -| Search-Order | **100%** (0 errors on test suite) | -| String | **100%** (0 errors on test suite) | -| Extended-Character | Pending | +- **File-Access word set** — requires WASI integration for file I/O +- **Extended-Character word set** — Unicode support +- **Browser target** — `wafer-web` crate with wasm-bindgen for a web REPL +- **Self-hosting** — minimal Rust kernel (~35 primitives), everything else in Forth ## License diff --git a/crates/core/src/compiler.rs b/crates/core/src/compiler.rs deleted file mode 100644 index c783c9d..0000000 --- a/crates/core/src/compiler.rs +++ /dev/null @@ -1,21 +0,0 @@ -//! Forth compile mode: builds IR from word definitions. -//! -//! When the outer interpreter encounters `:`, it switches to compile mode. -//! The compiler collects tokens and builds an IR representation until `;`. -//! IMMEDIATE words are executed during compilation (e.g., IF, ELSE, THEN). - -// TODO: Step 7 - Compiler implementation -// - : (colon) starts compilation, ; (semicolon) ends it -// - Build Vec for the word body -// - Handle IMMEDIATE words -// - Handle control structures (IF/ELSE/THEN, DO/LOOP, BEGIN/UNTIL) -// - LITERAL, POSTPONE, ['], [CHAR] -// - Defining words: VARIABLE, CONSTANT, CREATE, DOES> - -#[cfg(test)] -mod tests { - #[test] - fn placeholder() { - // Compiler tests will be added in Step 7 - } -} diff --git a/crates/core/src/lib.rs b/crates/core/src/lib.rs index 09f288b..5726f21 100644 --- a/crates/core/src/lib.rs +++ b/crates/core/src/lib.rs @@ -11,13 +11,10 @@ //! //! The compilation pipeline: //! 1. **Outer interpreter** tokenizes input and dispatches to interpret/compile mode -//! 2. **Compiler** builds an intermediate representation (IR) for each word definition -//! 3. **Type inference** annotates the IR with stack types -//! 4. **Optimizer** applies transformation passes (constant folding, inlining, etc.) -//! 5. **Codegen** translates optimized IR to WASM bytecode via `wasm-encoder` +//! 2. **Optimizer** applies transformation passes (constant folding, inlining, etc.) +//! 3. **Codegen** translates optimized IR to WASM bytecode via `wasm-encoder` pub mod codegen; -pub mod compiler; pub mod config; pub mod consolidate; pub mod dictionary; @@ -26,6 +23,3 @@ pub mod ir; pub mod memory; pub mod optimizer; pub mod outer; -pub mod primitives; -pub mod types; -pub mod words; diff --git a/crates/core/src/primitives.rs b/crates/core/src/primitives.rs deleted file mode 100644 index 39fa0a7..0000000 --- a/crates/core/src/primitives.rs +++ /dev/null @@ -1,19 +0,0 @@ -//! Built-in primitive words for WAFER. -//! -//! Primitives are the ~35 words that must be implemented in Rust because -//! they require direct WASM instructions or host interaction. -//! Everything else is defined in Forth (loaded from .fth files). - -// TODO: Step 6 - Primitive word implementations -// Each primitive provides: -// - Its StackEffect (type signature) -// - Its IR representation (for inlining by the optimizer) -// - Direct WASM instruction generation - -#[cfg(test)] -mod tests { - #[test] - fn placeholder() { - // Primitive tests will be added in Step 6 - } -} diff --git a/crates/core/src/types.rs b/crates/core/src/types.rs deleted file mode 100644 index cf59730..0000000 --- a/crates/core/src/types.rs +++ /dev/null @@ -1,106 +0,0 @@ -//! Type inference engine for WAFER's multi-typed stack. -//! -//! WAFER uses type inference to determine when values on the stack have -//! statically known types. When types are known, codegen uses WASM's native -//! typed operand stack and locals instead of simulating stacks in linear memory. - -/// Types that can appear on WAFER's stack. -#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] -pub enum StackType { - /// 32-bit integer (default Forth cell). - I32, - /// 64-bit integer (double-cell). - I64, - /// 32-bit float. - F32, - /// 64-bit float (Forth floating-point). - F64, - /// Boolean (result of comparisons). Represented as i32 at WASM level. - Bool, - /// Memory address. Represented as i32 at WASM level. - Addr, - /// Type is unknown or cannot be determined statically. - Unknown, -} - -impl StackType { - /// Returns the WASM value type for this stack type. - pub fn wasm_type(self) -> wasm_encoder::ValType { - match self { - StackType::I32 | StackType::Bool | StackType::Addr => wasm_encoder::ValType::I32, - StackType::I64 => wasm_encoder::ValType::I64, - StackType::F32 => wasm_encoder::ValType::F32, - StackType::F64 => wasm_encoder::ValType::F64, - StackType::Unknown => wasm_encoder::ValType::I32, // default to i32 - } - } - - /// Returns true if this type's WASM representation is i32. - pub fn is_i32_compatible(self) -> bool { - matches!( - self, - StackType::I32 | StackType::Bool | StackType::Addr | StackType::Unknown - ) - } -} - -/// Describes the stack effect of a Forth word. -/// -/// For example, `+` has effect `( I32 I32 -- I32 )`. -#[derive(Debug, Clone, PartialEq, Eq)] -pub struct StackEffect { - /// Types consumed from the stack (bottom to top). - pub inputs: Vec, - /// Types produced on the stack (bottom to top). - pub outputs: Vec, -} - -impl StackEffect { - /// Create a new stack effect. - pub fn new(inputs: Vec, outputs: Vec) -> Self { - Self { inputs, outputs } - } - - /// Number of items consumed. - pub fn input_count(&self) -> usize { - self.inputs.len() - } - - /// Number of items produced. - pub fn output_count(&self) -> usize { - self.outputs.len() - } - - /// Net stack depth change. - pub fn depth_change(&self) -> i32 { - self.outputs.len() as i32 - self.inputs.len() as i32 - } -} - -#[cfg(test)] -mod tests { - use super::*; - - #[test] - fn stack_type_wasm_mapping() { - assert_eq!(StackType::I32.wasm_type(), wasm_encoder::ValType::I32); - assert_eq!(StackType::F64.wasm_type(), wasm_encoder::ValType::F64); - assert_eq!(StackType::Bool.wasm_type(), wasm_encoder::ValType::I32); - assert_eq!(StackType::Addr.wasm_type(), wasm_encoder::ValType::I32); - } - - #[test] - fn stack_effect_depth() { - // DUP ( x -- x x ) - let dup = StackEffect::new(vec![StackType::I32], vec![StackType::I32, StackType::I32]); - assert_eq!(dup.depth_change(), 1); - - // + ( x y -- z ) - let add = StackEffect::new(vec![StackType::I32, StackType::I32], vec![StackType::I32]); - assert_eq!(add.depth_change(), -1); - - // DROP ( x -- ) - let drop_e = StackEffect::new(vec![StackType::I32], vec![]); - assert_eq!(drop_e.depth_change(), -1); - } -} diff --git a/crates/core/src/words/mod.rs b/crates/core/src/words/mod.rs deleted file mode 100644 index 9f8828e..0000000 --- a/crates/core/src/words/mod.rs +++ /dev/null @@ -1,19 +0,0 @@ -//! Forth 2012 word set implementations. -//! -//! Each submodule implements one word set from the Forth 2012 standard. -//! Words are implemented in Rust only when they require direct WASM instructions; -//! most words are defined in Forth source files under `forth/`. - -// Word set modules will be added as each set is implemented: -// pub mod core; -// pub mod core_ext; -// pub mod double; -// pub mod exception; -// pub mod floating; -// pub mod locals; -// pub mod string; -// pub mod tools; -// pub mod memory_alloc; -// pub mod search_order; -// pub mod file; -// pub mod facility; diff --git a/forth/boot.fth b/forth/boot.fth deleted file mode 100644 index c3e15d5..0000000 --- a/forth/boot.fth +++ /dev/null @@ -1,23 +0,0 @@ -\ WAFER Boot - Minimal bootstrap loaded first after primitives -\ This file defines the most fundamental derived words needed -\ before the rest of the standard library can load. - -\ These words are defined in terms of the ~35 Rust primitives. -\ They form the foundation for core.fth and all subsequent files. - -\ TODO: Step 7/8 - Populate with bootstrap definitions once -\ the compiler and outer interpreter are working. -\ For now this file documents what will go here: - -\ Derived stack operations: -\ : NIP ( x1 x2 -- x2 ) SWAP DROP ; -\ : TUCK ( x1 x2 -- x2 x1 x2 ) SWAP OVER ; -\ : 2DUP ( x1 x2 -- x1 x2 x1 x2 ) OVER OVER ; -\ : 2DROP ( x1 x2 -- ) DROP DROP ; -\ : ?DUP ( x -- 0 | x x ) DUP IF DUP THEN ; - -\ Basic arithmetic derived from primitives: -\ : 1+ ( n -- n+1 ) 1 + ; -\ : 1- ( n -- n-1 ) 1 - ; -\ : NEGATE ( n -- -n ) 0 SWAP - ; -\ : ABS ( n -- |n| ) DUP 0< IF NEGATE THEN ; diff --git a/forth/core.fth b/forth/core.fth deleted file mode 100644 index 44ddd6d..0000000 --- a/forth/core.fth +++ /dev/null @@ -1,48 +0,0 @@ -\ WAFER Core Word Set - High-level words defined in Forth -\ These implement Forth 2012 Core words that can be expressed -\ in terms of primitives and boot words. - -\ TODO: Step 10 - Populate as Core compliance tests are run. -\ Each word here will be tested against forth2012-test-suite/src/core.fr - -\ -- Derived stack operations -- -\ : NIP ( x1 x2 -- x2 ) SWAP DROP ; -\ : TUCK ( x1 x2 -- x2 x1 x2 ) SWAP OVER ; -\ : 2DUP ( x1 x2 -- x1 x2 x1 x2 ) OVER OVER ; -\ : 2DROP ( x1 x2 -- ) DROP DROP ; -\ : 2SWAP ( x1 x2 x3 x4 -- x3 x4 x1 x2 ) ROT >R ROT R> ; -\ : 2OVER ( x1 x2 x3 x4 -- x1 x2 x3 x4 x1 x2 ) >R >R 2DUP R> R> 2SWAP ; - -\ -- Arithmetic -- -\ : 1+ 1 + ; -\ : 1- 1 - ; -\ : 2* 1 LSHIFT ; -\ : 2/ 1 RSHIFT ; -\ : NEGATE 0 SWAP - ; -\ : ABS DUP 0< IF NEGATE THEN ; -\ : MIN 2DUP > IF SWAP THEN DROP ; -\ : MAX 2DUP < IF SWAP THEN DROP ; -\ : MOD /MOD DROP ; -\ : / /MOD NIP ; - -\ -- Comparison -- -\ : 0= 0 = ; -\ : 0<> 0= 0= ; -\ : 0< 0 < ; -\ : 0> 0 SWAP < ; -\ : <> = 0= ; - -\ -- Memory -- -\ : +! DUP @ ROT + SWAP ! ; -\ : CELLS 4 * ; -\ : CELL+ 4 + ; -\ : CHARS ; -\ : CHAR+ 1+ ; - -\ -- Defining words -- -\ : VARIABLE CREATE 0 , ; -\ : CONSTANT CREATE , DOES> @ ; - -\ -- I/O -- -\ : SPACE BL EMIT ; -\ : SPACES 0 ?DO SPACE LOOP ; diff --git a/forth/core_ext.fth b/forth/core_ext.fth deleted file mode 100644 index 0a46310..0000000 --- a/forth/core_ext.fth +++ /dev/null @@ -1,11 +0,0 @@ -\ WAFER Core Extensions Word Set -\ Forth 2012 Section 6.2 - -\ TODO: Step 13 - Implement as compliance tests are enabled -\ : VALUE CREATE , DOES> @ ; -\ : TO ' >BODY ! ; -\ : DEFER CREATE ['] ABORT , DOES> @ EXECUTE ; -\ : DEFER! >BODY ! ; -\ : DEFER@ >BODY @ ; -\ : IS STATE @ IF POSTPONE ['] POSTPONE DEFER! ELSE ' DEFER! THEN ; IMMEDIATE -\ : ACTION-OF STATE @ IF POSTPONE ['] POSTPONE DEFER@ ELSE ' DEFER@ THEN ; IMMEDIATE diff --git a/forth/prelude.fth b/forth/prelude.fth deleted file mode 100644 index 4b6ef0f..0000000 --- a/forth/prelude.fth +++ /dev/null @@ -1,8 +0,0 @@ -\ WAFER Prelude - Master loader for the standard library -\ This file is loaded at boot after primitives are registered. -\ It includes all word set files in dependency order. - -\ TODO: Enable includes as each word set is implemented -\ INCLUDE boot.fth -\ INCLUDE core.fth -\ INCLUDE core_ext.fth