Optimize DO/LOOP: index/limit in WASM locals, J as IR primitive

Two-path DO/LOOP codegen based on static analysis of the loop body: - Fast path (no calls, no >R/R> in body): index and limit live purely in WASM locals with zero return stack traffic per iteration. RFetch (I) and LoopJ (J) resolve to local.get instead of memory access. - Slow path (body has calls or explicit RS ops): locals still used for loop control, but synced to return stack for LEAVE/UNLOOP compatibility. Also converts J from a host function (WASM→Rust roundtrip per call) to an IR primitive (IrOp::LoopJ) that compiles to local.get of the outer loop's index local. Performance impact (vs gforth, all opts enabled): - Factorial: 1.02x → 0.94x (now faster than gforth) - NestedLoops: 717x → 543x (24% faster, still bottlenecked by data stack) - Fibonacci, GCD, Collatz: unchanged (don't use DO/LOOP)
2026-04-09 17:13:31 +02:00
parent 806d7b3094
commit 36a177a39a
3 changed files with 319 additions and 156 deletions
@@ -119,6 +119,9 @@ pub enum IrOp {
    FromR,
    /// Copy from return stack: ( -- x ) ( R: x -- x )
    RFetch,
+    /// Read outer DO/LOOP index (J): ( -- n )
+    /// Compiled to local.get when loop locals are available.
+    LoopJ,

    // -- Forth locals (from {: ... :} syntax) --
    /// Get Forth local variable N: ( -- x )