Commit Graph

82 Commits

Author SHA1 Message Date
ok 1482d7513e Replace 13 Rust host functions with Forth bootstrap (Phase 1)
Create boot.fth loaded at startup after IR primitives are compiled.
Forth-compiled WASM with direct calls outperforms host function dispatch
(no call_indirect overhead, Cranelift can inline across word boundaries).

Words moved to Forth: 2OVER, 2ROT, WITHIN, 2@, 2!, FILL, CMOVE, CMOVE>,
MOVE, ERASE, BLANK, /STRING, -TRAILING.

Removed 547 lines of Rust closures, replaced by 48 lines of Forth.
All 425 tests pass.
2026-04-04 13:47:47 +02:00
ok db6292add6 Implement --native flag for standalone executables
Add `wafer build --native` to produce self-contained native executables.
The approach appends AOT-precompiled WASM and metadata to a copy of the
wafer binary itself, requiring no Rust toolchain at build time.

On startup, the binary checks for an appended payload (8-byte "WAFEREXE"
magic trailer). If found, it deserializes the precompiled module and runs
it directly, skipping CLI argument parsing entirely.

Uses wasmtime's Engine::precompile_module() for AOT compilation at build
time and Module::deserialize() at runtime — instant startup with no JIT.

Binary layout: [wafer binary][precompiled wasm][metadata json][trailer]
Trailer: payload_len(u64 LE) + metadata_len(u64 LE) + "WAFEREXE"

Also refactored runner.rs: extracted shared run_module() to avoid
duplication between run_wasm_bytes() and run_precompiled_bytes().
Made serialize_metadata() public for CLI use.
2026-04-04 12:10:13 +02:00
ok 3a0f328f90 Implement WASM export and standalone execution
Add `wafer build` to compile Forth source files to standalone .wasm modules,
and `wafer run` to execute them. The same .wasm file works with both the
wafer runtime (via wasmtime) and in browsers (via generated JS loader).

New CLI subcommands:
- `wafer build file.fth -o file.wasm` — compile to standalone WASM
- `wafer build file.fth -o file.wasm --js` — also generate JS/HTML loader
- `wafer build file.fth --entry WORD` — custom entry point
- `wafer run file.wasm` — execute pre-compiled module

Entry point resolution: --entry flag > MAIN word > recorded top-level execution.
Memory snapshot embedded as WASM data section preserves VARIABLE/CONSTANT state.
Metadata in custom "wafer" section enables the runner to provide host functions.

New modules: export.rs (orchestration), runner.rs (wasmtime host), js_loader.rs
(browser support). Refactored codegen.rs to share logic between consolidation
and export via compile_multi_word_module(). Added ir_bodies tracking for
VARIABLE, CONSTANT, CREATE, VALUE, DEFER, BUFFER:, MARKER, 2CONSTANT,
2VARIABLE, 2VALUE, FVARIABLE defining words.

Removed dead code: dot_func field, unused wafer-web stub crate, wasmtime-wasi
dependency from CLI, orphaned --consolidate/--output CLI flags.

425 tests pass (414 original + 11 new including 7 round-trip integration tests).
2026-04-04 11:33:11 +02:00
ok 321903831d Add Forth 2012 + WAFER Anki flashcard deck 2026-04-02 14:11:26 +02:00
ok 22373d89af Fix dprint markdown formatting in README 2026-04-02 14:00:19 +02:00
ok c9bf61aeec Remove unused stub files: forth/, words/, compiler.rs, primitives.rs, types.rs
All were planning artifacts never imported or loaded:
- forth/ (4 .fth files): commented-out TODO stubs, never loaded at startup
- crates/core/src/words/mod.rs: empty module with commented-out submodules
- compiler.rs: placeholder, all compiler logic lives in outer.rs
- primitives.rs: placeholder, all primitives registered in outer.rs
- types.rs: StackType/StackEffect defined but never imported anywhere
2026-04-02 13:52:45 +02:00
ok 6c60cbb741 Implement float IR operations: 25 words compiled to native WASM f64
Convert 25 float words from host functions to IR primitives:
- Stack: FDROP FDUP FSWAP FOVER FNIP FTUCK
- Arithmetic: F+ F- F* F/ FNEGATE FABS FSQRT FMIN FMAX FLOOR FROUND
- Comparisons: F0= F0< F= F<
- Memory: F@ F!
- Conversions: S>F F>S

24 new IrOp variants compiled to native WASM f64 instructions.
EmitCtx struct threads f64 scratch locals through all emit functions.
Float constant folding: 1.5E0 2.5E0 F+ folds to PushF64(4.0).
Float peephole: PushF64+FDrop, FDup+FDrop, FSwap+FSwap eliminated.
Float literals now compile as PushF64 IR ops instead of anonymous host calls.

~420 lines of Rust closure code removed from outer.rs.
All 14 optimizations now implemented. 430 tests passing.
2026-04-02 13:47:28 +02:00
ok ef79b28e45 Implement startup batching: 12x faster boot
Batch-compile all ~64 IR primitives into a single WASM module at startup.
Replaces 64 separate Module::new + Instance::new with 1 of each.
Reuses compile_consolidated_module() directly, removed compile_core_module() stub.

Boot time: 7.7ms -> 0.6ms (release), test suite: 5.1s -> 1.5s (debug).
13 of 14 optimizations now implemented. 392 tests passing.
2026-04-02 13:05:53 +02:00
ok f3bc270904 Update all docs to reflect current state
README: 392 tests, 200+ words, 12 word sets, optimization pipeline described
CLAUDE.md: 200+ words, 12 word sets, 392 tests, added optimizer/config/consolidate to key files
OPTIMIZATIONS.md: update all 14 section statuses (12 done, 2 not started)
WAFER.md: correct line counts, add optimizer/config/consolidate/types to project layout, add FSP global
2026-04-02 12:47:50 +02:00
ok dea3a32c33 Add switchable optimization config and benchmark framework
WaferConfig: unified config controlling all optimizations individually.
ForthVM::new_with_config(config) to create VMs with custom optimization settings.
All 8 switchable optimizations: peephole, constant_fold, strength_reduce, dce,
tail_call, inline (IR passes) + stack_to_local_promotion (codegen).

Benchmark framework (crates/core/tests/benchmark_report.rs):
- 7 Forth benchmarks: Fibonacci, Factorial, SumRecurse, NestedLoops, GCD, MemFill, Collatz
- Correctness verification across all configs (runs in CI)
- Full report with 128 optimization combinations (cargo test --ignored)
- Measures execution time, compilation time, WASM module bytes
- CONSOLIDATE impact comparison

Key findings from benchmark report:
- Inlining: -77% exec time on Fibonacci, -92% on Collatz
- Stack-to-local promotion: -5.5% WASM module size
- CONSOLIDATE: -72% exec time on Fibonacci (call_indirect -> direct call)
- All optimizations combined: best overall performance
2026-04-02 12:24:57 +02:00
ok 759142ea75 Add stack-to-local promotion, verify all optimizations end-to-end
Stack-to-local promotion (Phase 1):
- is_promotable() identifies straight-line words (no control flow/calls/I/O)
- StackSim maps stack slots to WASM locals
- Stack manipulation (Swap, Rot, Nip, Tuck, Dup, Drop) emits ZERO instructions
- Prologue loads items from memory, epilogue writes back
- ~7x instruction reduction for DUP * and similar patterns

End-to-end verification (16 tests proving each optimization is active):
- verify_peephole_active: 0+ elimination
- verify_constant_folding_active: 3 4 + folded to 7
- verify_strength_reduction_active: 4* becomes shift
- verify_dce_active: code after EXIT eliminated
- verify_tail_call_active: recursive RECURSE works
- verify_inlining_active: small word inlined and folded
- verify_compound_ops_active: 2DUP works
- verify_dsp_caching_active: factorial via RECURSE
- verify_consolidation_active: CONSOLIDATE word
- verify_stack_promotion_*: 7 tests for promoted codegen

22 additional codegen promotion tests (wasmtime execution).
Fix F~ stack overflow panic (checked_sub instead of unchecked).
380 unit tests + 11 compliance tests, all passing.
2026-04-01 23:51:15 +02:00
ok 2b43a36a83 Update OPTIMIZATIONS.md: 12 of 14 done, stack-to-local Phase 1 complete 2026-04-01 22:59:23 +02:00
ok 0a9be743a1 Implement stack-to-local promotion and consolidation recompiler
Stack-to-local promotion (Phase 1: straight-line code):
- Words with no control flow/calls use WASM locals instead of memory stack
- Stack manipulation (Swap, Rot, Nip, Tuck, Dup, Drop) emits ZERO instructions
- ~7x instruction reduction for arithmetic-heavy words like DUP *
- Pre-loads consumed items from memory, writes results back at exit

Consolidation recompiler (CONSOLIDATE word):
- Recompiles all IR-based words into single WASM module
- Direct call instructions instead of call_indirect through function table
- Cranelift can inline and optimize across word boundaries
- All control flow variants support consolidated calls

342 unit tests + 11 compliance, all passing.
2026-04-01 22:56:00 +02:00
ok 35830fd986 Update OPTIMIZATIONS.md: 10 of 14 optimizations implemented 2026-04-01 22:35:18 +02:00
ok b2cf289c36 Add inlining, DSP caching, fix TailCall-in-inline bug
Inlining: store IR bodies for all words, inline Call(id) when body <= 8 ops
and non-recursive. Convert TailCall back to Call when inlining (tail position
in callee is not tail position in caller -- found via compliance test failure
where inlined TailCall caused unreachable code after the call site).

DSP global caching: cache $dsp in WASM local 0 at function entry, use
local.get/set throughout, writeback before calls and at function exit.
Reduces global access instructions by ~30-40%.

323 unit tests + 11 compliance, all passing.
2026-04-01 22:34:51 +02:00
ok 282f884a3d Implement optimization pipeline: peephole, constant folding, strength reduction, DCE, tail calls
IR optimizer with 6 composable passes:
- Peephole: PushI32+Drop, Dup+Drop, Swap+Swap, Swap+Drop→Nip, identity ops
- Constant folding: binary (Add/Sub/Mul/And/Or/Xor/shifts/comparisons) + unary (Negate/Abs/Invert/ZeroEq/ZeroLt)
- Strength reduction: power-of-2 multiply→shift, PushI32(0)+Eq→ZeroEq
- Dead code elimination: truncate after Exit, constant-conditional If
- Tail call detection: last Call→TailCall when return stack balanced
- Compound ops: Over+Over→TwoDup, Drop+Drop→TwoDrop with optimized codegen

Dictionary hash index for O(1) word lookup during compilation.
wasmtime config: disable NaN canonicalization, enable module caching.
319 unit tests + 11 compliance, all passing.
2026-04-01 21:50:08 +02:00
ok 2c1f7fb3af Update README: 12 word sets at 100%, 200+ words, floating-point complete 2026-04-01 20:40:50 +02:00
ok eb79c40c69 Implement complete Floating-Point word set, 70+ float words
Separate float stack with fsp global, IEEE 754 double precision.
Stack ops: FDROP FDUP FSWAP FOVER FROT FDEPTH
Arithmetic: F+ F- F* F/ FNEGATE FABS FMAX FMIN FSQRT FLOOR FROUND F**
Comparisons: F0= F0< F= F< F~
Memory: F@ F! SF@ SF! DF@ DF! FLOAT+ FLOATS FALIGNED FALIGN
Conversions: D>F F>D S>F F>S
Trig: FSIN FCOS FTAN FASIN FACOS FATAN FATAN2 FSINCOS
Exp/Log: FEXP FEXPM1 FLN FLNP1 FLOG FALOG
Hyperbolic: FSINH FCOSH FTANH FASINH FACOSH FATANH
I/O: F. FE. FS. REPRESENT >FLOAT PRECISION SET-PRECISION
Defining: FVARIABLE FCONSTANT FVALUE FLITERAL
Float literal parsing (1E, 1.5E2, -3.14E0 format)
299 unit tests + 11 compliance tests, 0 errors on float test suite
2026-04-01 20:38:48 +02:00
ok 3e7f92b7ef Add working compliance test harness, 11 word sets at 100%
Replace placeholder compliance tests with real harness that boots WAFER,
loads Gerry Jackson's test suite, and asserts 0 errors per word set.

Passing word sets (11/13):
  Core, Core Plus, Core Ext, Exception, Double-Number, String,
  Search-Order, Memory-Allocation, Programming-Tools, Facility, Locals

Not yet: File-Access (needs WASI), Floating-Point, Extended-Character
272 total tests (261 unit + 11 compliance)
2026-03-31 15:25:02 +02:00
ok f80c612835 Implement Double-Number and String word sets, fix memory panics
Double-Number (19 words): D+ D- DNEGATE DABS D2* D2/ D0= D0< D= D< DU<
  DMAX DMIN D>S M+ M*/ D. D.R 2ROT 2CONSTANT 2VARIABLE 2VALUE 2LITERAL
  Double-number literal parsing (tokens ending with '.')
String (5 words): COMPARE SEARCH /STRING BLANK -TRAILING SLITERAL
Fix all memory access panics with bounds checking throughout host functions.

8 word sets at 100%: Core, Core Ext, Exception, Double, String,
  Search-Order, Memory-Allocation, Programming-Tools
2026-03-31 14:43:30 +02:00
ok 8bfdd966ea Add optimization docs, workspace lints, and pre-commit hooks
- Add docs/OPTIMIZATIONS.md: catalog of 14 optimization passes with
  status tracking and implementation roadmap
- Configure workspace-level clippy and rustc lints in Cargo.toml
- Add clippy.toml and deny.toml for clippy thresholds and dependency
  auditing (licenses, advisories, bans)
- Set up pre-commit hook: cargo fmt, dprint, clippy, cargo deny,
  cargo machete
- Update Justfile with deny/machete targets, dprint in fmt checks
2026-03-30 23:01:35 +02:00
ok f99f9d5290 Achieve 100% Core Extensions compliance, 261 tests
Implement 25+ Core Extension words:
- VALUE/TO, DEFER/IS/ACTION-OF, :NONAME
- CASE/OF/ENDOF/ENDCASE, ?DO, AGAIN
- PARSE, PARSE-NAME, S\", C", HOLDS, BUFFER:
- 2>R, 2R>, 2R@, U>, .R, U.R, PAD, ERASE, UNUSED
- REFILL, SOURCE-ID, MARKER (stub)

Fix panic on invalid memory access (bounds check in FIND).
Rewrite FIND/WORD host functions for inline operation.
Add BeginAgain IR variant and codegen.

Three word sets at 100%: Core, Core Extensions, Exception.
2026-03-30 22:19:49 +02:00
ok 2c74222193 Achieve 100% Core compliance, implement CATCH/THROW
Core word set: 0 errors on Gerry Jackson's forth2012-test-suite/core.fr
- Fix POSTPONE for non-immediate words via COMPILE, mechanism
- Fix double-DOES> (WEIRD: pattern) with does-body scanning and
  runtime patching via _DOES_PATCH_
- Implement CATCH/THROW exception handling using wasmtime trap
  mechanism with stack pointer save/restore
- 232 tests passing
2026-03-30 21:26:21 +02:00
ok 6d3b7c5a89 Add docs/FORTH.md: rewrite Forth documentation with philosophical framing
Rename ABOUT_FORTH.md to FORTH.md and rewrite to cover Forth's unique
position as simultaneously low-level and high-level, where Forth is used
today (Philae lander, Open Firmware, embedded systems), and why Forth
maps naturally onto WebAssembly's stack machine architecture.
2026-03-30 21:03:59 +02:00
ok cb270c8765 Reach 97% Core compliance: 58 errors down to 3
- Fix HERE corruption: sync user_here before writing to shared cell
- Fix DOES> without CREATE: patch most-recent word, not read new name
- Implement >BODY via word_pfa_map tracking parameter field addresses
- Nested BEGIN...WHILE...WHILE...REPEAT...ELSE...THEN support
- DEPTH overflow protection
- Forth 2012 core.fr: 3 errors remaining (POSTPONE edge case,
  double-DOES>, NOP meta-programming)
2026-03-30 21:02:00 +02:00
ok 1d204c0a86 Fix Core test suite compliance: >IN sync, RSHIFT, +LOOP, pictured output
Major compliance fixes for running Gerry Jackson's core.fr tests:
- >IN synchronization: outer interpreter reads >IN back from WASM memory
  after each word, enabling TESTING and other >IN-manipulating words
- RSHIFT changed to logical (unsigned) shift per Forth 2012 spec
- +LOOP uses boundary-crossing termination check for negative steps
- HEX/DECIMAL compile as WASM primitives (work inside definitions)
- BASE read from WASM memory for all number formatting
- Pictured numeric output: <# # #S #> HOLD SIGN
- New words: 2@ 2! .( ] ArithRshift
- Error recovery resets compile state on failure
- FIND reads counted strings from WASM memory
- Forth 2012 core.fr: 58 errors remaining (from unable-to-load)
2026-03-30 18:17:59 +02:00
ok fb1395c740 Add DOES>, EVALUATE, double-cell arithmetic, and 20+ more Core words
- DOES> with split-compilation for defining words (CREATE , DOES> @ pattern)
- EVALUATE for string interpretation
- Double-cell: M* UM* UM/MOD FM/MOD SM/REM S>D */ */MOD
- Parsing: WORD FIND COUNT >NUMBER >IN STATE
- Memory: CMOVE CMOVE>
- Compile-time: ABORT" S" (compile mode)
- 219 tests passing, ~90% Core word set coverage
- Update docs to reflect current implementation
2026-03-29 23:40:37 +02:00
ok 1fd8f7196e Update documentation to reflect current implementation state
README now documents all 70+ implemented words, working examples,
architecture overview, and accurate compliance status.
CLAUDE.md updated with actual file descriptions, patterns for adding
new words, and current test count.
2026-03-29 23:14:54 +02:00
ok 5eee0d1810 Add 50+ Core words: loops, defining words, memory, system primitives
- Loop support: I, J, UNLOOP, LEAVE
- Defining words: VARIABLE, CONSTANT, CREATE
- Memory: HERE, ALLOT, comma, C-comma, CELLS, CELL+, CHARS, CHAR+,
  ALIGNED, ALIGN, MOVE, FILL
- Stack: 2DUP, 2DROP, 2SWAP, 2OVER, ?DUP, PICK, MIN, MAX, WITHIN
- Comparison: 0<>, 0>
- System: EXECUTE, IMMEDIATE, DECIMAL, HEX, TYPE, SPACES, tick,
  CHAR, [CHAR], ['], >BODY, ENVIRONMENT?, SOURCE, ABORT
- Number output now respects BASE (HEX FF DECIMAL . prints 255)
- 185 tests passing
2026-03-29 23:10:51 +02:00
ok d22a0a5756 Implement core Forth runtime: dictionary, codegen, outer interpreter, REPL
- Dictionary: linked-list word headers in simulated linear memory with
  create/find/reveal, case-insensitive lookup, IMMEDIATE flag support
- WASM codegen: IR-to-WASM translation via wasm-encoder with full
  validation; all stack, arithmetic, comparison, logic, memory, control
  flow, and return stack operations; wasmtime execution tests
- Outer interpreter: tokenizer, number parsing (decimal/$hex/#dec/%bin),
  interpret/compile dispatch, control structures (IF/ELSE/THEN,
  BEGIN/UNTIL, BEGIN/WHILE/REPEAT), RECURSE, comments, string output
- 40+ primitive words registered via JIT-compiled WASM modules linked
  to shared memory/globals/table
- Interactive REPL with rustyline, piped input, and file execution
- 145 tests passing across dictionary, codegen, and runtime
2026-03-29 22:48:37 +02:00
ok b8993f556e Switch to dual MIT/Apache-2.0 licensing, fix repository URL 2026-03-29 22:30:18 +02:00
ok 683281363d Initial commit: WAFER (WebAssembly Forth Engine in Rust)
Optimizing Forth 2012 compiler targeting WebAssembly with IR-based
compilation pipeline, multi-typed stack inference, subroutine threading,
and JIT/consolidation modes. Rust kernel with ~35 primitives and Forth
standard library for core/core-ext word sets.
2026-03-29 22:30:18 +02:00