Optimize DO/LOOP: index/limit in WASM locals, J as IR primitive

Two-path DO/LOOP codegen based on static analysis of the loop body:

- Fast path (no calls, no >R/R> in body): index and limit live purely
  in WASM locals with zero return stack traffic per iteration. RFetch (I)
  and LoopJ (J) resolve to local.get instead of memory access.

- Slow path (body has calls or explicit RS ops): locals still used for
  loop control, but synced to return stack for LEAVE/UNLOOP compatibility.

Also converts J from a host function (WASM→Rust roundtrip per call) to
an IR primitive (IrOp::LoopJ) that compiles to local.get of the outer
loop's index local.

Performance impact (vs gforth, all opts enabled):
- Factorial: 1.02x → 0.94x (now faster than gforth)
- NestedLoops: 717x → 543x (24% faster, still bottlenecked by data stack)
- Fibonacci, GCD, Collatz: unchanged (don't use DO/LOOP)
This commit is contained in:
2026-04-09 17:13:31 +02:00
parent 806d7b3094
commit 36a177a39a
3 changed files with 319 additions and 156 deletions
+3
View File
@@ -119,6 +119,9 @@ pub enum IrOp {
FromR,
/// Copy from return stack: ( -- x ) ( R: x -- x )
RFetch,
/// Read outer DO/LOOP index (J): ( -- n )
/// Compiled to local.get when loop locals are available.
LoopJ,
// -- Forth locals (from {: ... :} syntax) --
/// Get Forth local variable N: ( -- x )