docs: rewrite architecture.txt + fix mem offsets
architecture.txt drifted from code: missing HASH_SCRATCH region, runtime-trait box, wordlists/search-order, codegen locals layout, F: locals, quotations, crypto. Rewrite from current source. memory.rs `// 0x...` annotations were the drift source — RETURN / FLOAT / HASH / DICT bases printed values disagreeing with the const arithmetic. Recompute and correct.
This commit is contained in:
@@ -50,23 +50,23 @@ pub const DATA_STACK_BASE: u32 = WORD_BUF_BASE + WORD_BUF_SIZE; // 0x0600
|
|||||||
pub const DATA_STACK_SIZE: u32 = 4096; // 1024 cells
|
pub const DATA_STACK_SIZE: u32 = 4096; // 1024 cells
|
||||||
|
|
||||||
/// Return stack region. Grows downward.
|
/// Return stack region. Grows downward.
|
||||||
pub const RETURN_STACK_BASE: u32 = DATA_STACK_BASE + DATA_STACK_SIZE; // 0x1540
|
pub const RETURN_STACK_BASE: u32 = DATA_STACK_BASE + DATA_STACK_SIZE; // 0x1600
|
||||||
/// Size of return stack region.
|
/// Size of return stack region.
|
||||||
pub const RETURN_STACK_SIZE: u32 = 4096;
|
pub const RETURN_STACK_SIZE: u32 = 4096;
|
||||||
|
|
||||||
/// Floating-point stack region (fallback). Grows downward.
|
/// Floating-point stack region (fallback). Grows downward.
|
||||||
pub const FLOAT_STACK_BASE: u32 = RETURN_STACK_BASE + RETURN_STACK_SIZE; // 0x2540
|
pub const FLOAT_STACK_BASE: u32 = RETURN_STACK_BASE + RETURN_STACK_SIZE; // 0x2600
|
||||||
/// Size of float stack region.
|
/// Size of float stack region.
|
||||||
pub const FLOAT_STACK_SIZE: u32 = 2048; // 256 doubles
|
pub const FLOAT_STACK_SIZE: u32 = 2048; // 256 doubles
|
||||||
|
|
||||||
/// Hash scratch region — output buffer for `SHA1`/`SHA256`/`SHA512` and
|
/// Hash scratch region — output buffer for `SHA1`/`SHA256`/`SHA512` and
|
||||||
/// other hash host words. Sized for the largest supported digest (SHA512 = 64 B).
|
/// other hash host words. Sized for the largest supported digest (SHA512 = 64 B).
|
||||||
pub const HASH_SCRATCH_BASE: u32 = FLOAT_STACK_BASE + FLOAT_STACK_SIZE; // 0x2D40
|
pub const HASH_SCRATCH_BASE: u32 = FLOAT_STACK_BASE + FLOAT_STACK_SIZE; // 0x2E00
|
||||||
/// Size of hash scratch region.
|
/// Size of hash scratch region.
|
||||||
pub const HASH_SCRATCH_SIZE: u32 = 128;
|
pub const HASH_SCRATCH_SIZE: u32 = 128;
|
||||||
|
|
||||||
/// Dictionary region start. Grows upward.
|
/// Dictionary region start. Grows upward.
|
||||||
pub const DICTIONARY_BASE: u32 = HASH_SCRATCH_BASE + HASH_SCRATCH_SIZE; // 0x2DC0
|
pub const DICTIONARY_BASE: u32 = HASH_SCRATCH_BASE + HASH_SCRATCH_SIZE; // 0x2E80
|
||||||
|
|
||||||
/// Initial top of data stack (grows down from here).
|
/// Initial top of data stack (grows down from here).
|
||||||
pub const DATA_STACK_TOP: u32 = DATA_STACK_BASE + DATA_STACK_SIZE;
|
pub const DATA_STACK_TOP: u32 = DATA_STACK_BASE + DATA_STACK_SIZE;
|
||||||
|
|||||||
+329
-157
@@ -1,6 +1,11 @@
|
|||||||
WAFER Architecture Reference (updated 2026-04-13)
|
WAFER Architecture Reference (updated 2026-04-16)
|
||||||
===================================================
|
===================================================
|
||||||
|
|
||||||
|
WAFER = WebAssembly Forth Engine in Rust. Optimizing Forth-2012 compiler that
|
||||||
|
emits WASM at run time. Each colon definition becomes its own WASM module that
|
||||||
|
shares memory, globals, and a function table with every other word.
|
||||||
|
|
||||||
|
|
||||||
1. COMPILATION PIPELINE
|
1. COMPILATION PIPELINE
|
||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
@@ -11,96 +16,134 @@ WAFER Architecture Reference (updated 2026-04-13)
|
|||||||
+--------------------------------------------+
|
+--------------------------------------------+
|
||||||
| Tokenizer: whitespace-delimited words |
|
| Tokenizer: whitespace-delimited words |
|
||||||
| For each token: |
|
| For each token: |
|
||||||
| 1. Dictionary lookup (find) |
|
| 1. Dictionary lookup (HashMap + wordlist |
|
||||||
| 2. If found + interpret mode: EXECUTE |
|
| search order) |
|
||||||
| 3. If found + compile mode: |
|
| 2. Found + interpret mode: EXECUTE |
|
||||||
| - Immediate? Execute now |
|
| 3. Found + compile mode: |
|
||||||
|
| - IMMEDIATE? Execute now |
|
||||||
| - Normal? Append Call(WordId) to IR |
|
| - Normal? Append Call(WordId) to IR |
|
||||||
| 4. Not found: try parse as number |
|
| 4. Not found: try parse as number |
|
||||||
| - Interpret: push to data stack |
|
| - Interpret: push to data stack |
|
||||||
| - Compile: append PushI32(n) to IR |
|
| - Compile: append PushI32/64/F64 |
|
||||||
| 5. Neither: error "unknown word" |
|
| 5. Neither: error "unknown word" |
|
||||||
|
| Special cases handled here, not via IR: |
|
||||||
|
| defining words (CREATE, VARIABLE, :), |
|
||||||
|
| DOES> dispatch, S" / ." string parsing, |
|
||||||
|
| {: ... :} locals, [: ... ;] quotations. |
|
||||||
+--------------------------------------------+
|
+--------------------------------------------+
|
||||||
| On `;` (end of colon definition):
|
| On `;` (end of colon definition):
|
||||||
v
|
v
|
||||||
Optimizer (optimizer.rs)
|
Optimizer (optimizer.rs) — IR -> IR
|
||||||
+--------------------------------------------+
|
+--------------------------------------------+
|
||||||
| Phase 1: Simplify |
|
| Phase 1 simplify: |
|
||||||
| Peephole -> Constant Fold -> |
|
| peephole -> fold -> strength -> peephole |
|
||||||
| Strength Reduce -> Peephole |
|
| Phase 2 inline (max 8 ops) then re-simpl.: |
|
||||||
| Phase 2: Inline then re-simplify |
|
| inline -> peephole -> fold -> strength |
|
||||||
| Inline(max=8) -> Peephole -> |
|
| -> peephole |
|
||||||
| Constant Fold -> Strength Reduce -> |
|
| Phase 3 dead code: dce -> peephole |
|
||||||
| Peephole |
|
| Phase 4 tail calls (must be last) |
|
||||||
| Phase 3: Eliminate dead code |
|
| Total peephole passes: 5 |
|
||||||
| DCE -> Peephole |
|
|
||||||
| Phase 4: Tail calls (must be last) |
|
|
||||||
| Tail Call Detect |
|
|
||||||
+--------------------------------------------+
|
+--------------------------------------------+
|
||||||
|
|
|
|
||||||
v
|
v
|
||||||
Codegen (codegen.rs)
|
Codegen (codegen.rs) — IR -> WASM bytes
|
||||||
+--------------------------------------------+
|
+--------------------------------------------+
|
||||||
| IR -> WASM bytecode via wasm-encoder |
|
| wasm-encoder builds one module per word. |
|
||||||
| Each word = one WASM module with: |
|
| Function locals (laid out in order): |
|
||||||
| Imports: emit, memory, dsp, rsp, fsp, |
|
| 0 cached DSP (i32) |
|
||||||
| table |
|
| 1..s scratch i32 (or promoted |
|
||||||
| Types: void () -> (), i32 (i32) -> () |
|
| stack-to-local slots) |
|
||||||
| One defined function (the word body) |
|
| s..f Forth locals from {: ... :} |
|
||||||
| DSP cached in local 0, writeback before |
|
| (i32 then f64) |
|
||||||
| calls, reload after calls |
|
| f..l loop locals: 2 per nested |
|
||||||
| Scratch locals start at index 1 |
|
| DO/?DO (index, limit) |
|
||||||
|
| DSP write-back before every Call, |
|
||||||
|
| reload after — keeps host functions and |
|
||||||
|
| call_indirect targets coherent. |
|
||||||
|
| Stack-to-local promotion (codegen flag): |
|
||||||
|
| straight-line + simple control flow |
|
||||||
|
| words skip the linear-memory data stack |
|
||||||
|
| entirely; values stay in WASM locals. |
|
||||||
+--------------------------------------------+
|
+--------------------------------------------+
|
||||||
|
|
|
|
||||||
v
|
v
|
||||||
Runtime trait (runtime.rs)
|
Runtime trait (runtime.rs) — execution backend
|
||||||
+--------------------------------------------+
|
+--------------------------------------------+
|
||||||
| ForthVM<R: Runtime> — generic over backend |
|
| ForthVM<R: Runtime> generic over backend. |
|
||||||
| Runtime provides: |
|
| Runtime owns: |
|
||||||
| - Memory r/w (mem_read_i32, etc.) |
|
| - shared linear memory (16 pages init) |
|
||||||
| - Globals (get/set_dsp, rsp, fsp) |
|
| - shared funcref table (grows on demand) |
|
||||||
| - Table (ensure_table_size) |
|
| - 3 mutable i32 globals (dsp/rsp/fsp) |
|
||||||
| - instantiate_and_install(wasm_bytes) |
|
| - emit() import bound to output buffer |
|
||||||
| - call_func(fn_index) |
|
| Runtime methods: |
|
||||||
| - register_host_func(fn_index, HostFn) |
|
| mem_read/write_{i32,u8,slice} |
|
||||||
|
| get/set_{dsp,rsp,fsp} |
|
||||||
|
| ensure_table_size(n) |
|
||||||
|
| instantiate_and_install(wasm, fn_index) |
|
||||||
|
| call_func(fn_index) |
|
||||||
|
| register_host_func(fn_index, HostFn) |
|
||||||
| |
|
| |
|
||||||
| HostAccess trait — memory/global ops for |
|
| HostAccess trait — same memory/global ops |
|
||||||
| host function callbacks |
|
| exposed to host-fn callbacks; lets one |
|
||||||
| HostFn = Box<dyn Fn(&mut dyn HostAccess)> |
|
| HostFn closure run on either runtime. |
|
||||||
|
| HostFn = Box<dyn Fn(&mut dyn HostAccess) |
|
||||||
|
| -> Result<()> + Send + Sync> |
|
||||||
+--------------------------------------------+
|
+--------------------------------------------+
|
||||||
| |
|
| |
|
||||||
v v
|
v v
|
||||||
NativeRuntime WebRuntime
|
NativeRuntime WebRuntime
|
||||||
(runtime_native.rs) (crates/web/runtime_web.rs)
|
(runtime_native.rs, (crates/web/src/
|
||||||
|
feature = "native") runtime_web.rs)
|
||||||
+------------------+ +------------------+
|
+------------------+ +------------------+
|
||||||
| wasmtime Engine | | js_sys::WebAsm |
|
| wasmtime Engine, | | js_sys WebAsm |
|
||||||
| Store, Memory | | Memory, Table |
|
| Store, Memory, | | Memory, Table, |
|
||||||
| Table, Globals | | Global objects |
|
| Table, Globals, | | Global, JS |
|
||||||
| Func closures | | JS Closures |
|
| Func closures | | Closures |
|
||||||
+------------------+ +------------------+
|
+------------------+ +------------------+
|
||||||
|
|
||||||
|
|
||||||
2. MEMORY LAYOUT (Linear Memory)
|
2. MEMORY LAYOUT (linear memory, single shared instance)
|
||||||
--------------------------------
|
--------------------------------------------------------
|
||||||
|
|
||||||
Address Region Size Notes
|
Address Region Size Notes
|
||||||
-------- ------------------ ------- -------------------------
|
-------- ------------------ ------- --------------------------
|
||||||
0x0000 System Variables 64 B STATE, BASE, >IN, HERE,
|
0x0000 System Variables 64 B STATE, BASE, >IN, HERE,
|
||||||
LATEST, SOURCE-ID, #TIB,
|
LATEST, SOURCE-ID, #TIB,
|
||||||
HLD, LEAVE-FLAG
|
HLD, LEAVE-FLAG
|
||||||
0x0040 Input Buffer 1024 B Source parsing
|
0x0040 Input Buffer (TIB) 1024 B Source line being parsed
|
||||||
0x0440 PAD 256 B Scratch area
|
0x0440 PAD 256 B Scratch for string ops
|
||||||
0x0540 Pictured Output 128 B <# ... #> (grows down)
|
0x0540 Pictured Output 128 B <# ... #> (HLD grows down)
|
||||||
0x05C0 WORD Buffer 64 B Transient counted string
|
0x05C0 WORD Buffer 64 B Transient counted string
|
||||||
0x0600 Data Stack 4096 B 1024 cells, grows DOWN
|
0x0600 Data Stack 4096 B 1024 cells, grows DOWN
|
||||||
0x1600 (Data Stack Top) DSP starts here
|
^ DSP starts at top = 0x1600
|
||||||
0x1540 Return Stack 4096 B Grows DOWN
|
0x1600 Return Stack 4096 B Grows DOWN
|
||||||
0x2540 Float Stack 2048 B 256 doubles, grows DOWN
|
^ RSP starts at top = 0x2600
|
||||||
0x2D40 Dictionary grows UP Linked list of word entries
|
0x2600 Float Stack 2048 B 256 doubles, grows DOWN
|
||||||
|
^ FSP starts at top = 0x2E00
|
||||||
|
0x2E00 Hash Scratch 128 B SHA1/256/512 output
|
||||||
|
0x2E80 Dictionary grows UP Linked list of entries
|
||||||
|
|
||||||
Total initial memory: 16 pages = 1 MiB (max 256 pages = 16 MiB)
|
Constants from crates/core/src/memory.rs (authoritative):
|
||||||
Cell size: 4 bytes (i32)
|
SYSVAR_BASE 0x0000 size 64
|
||||||
Float size: 8 bytes (f64)
|
INPUT_BUFFER_BASE 0x0040 size 1024
|
||||||
|
PAD_BASE 0x0440 size 256
|
||||||
|
PICT_BUF_BASE 0x0540 size 128
|
||||||
|
WORD_BUF_BASE 0x05C0 size 64
|
||||||
|
DATA_STACK_BASE 0x0600 size 4096 (DATA_STACK_TOP = 0x1600)
|
||||||
|
RETURN_STACK_BASE 0x1600 size 4096 (RETURN_STACK_TOP = 0x2600)
|
||||||
|
FLOAT_STACK_BASE 0x2600 size 2048 (FLOAT_STACK_TOP = 0x2E00)
|
||||||
|
HASH_SCRATCH_BASE 0x2E00 size 128
|
||||||
|
DICTIONARY_BASE 0x2E80 grows up to memory.len()
|
||||||
|
(Some inline `// 0x...` comments in memory.rs are stale — the
|
||||||
|
computed values above are correct; the consts are derived.)
|
||||||
|
|
||||||
|
Total initial memory: 16 pages = 1 MiB (max 256 pages = 16 MiB).
|
||||||
|
Cell size: 4 bytes (i32). Float size: 8 bytes (f64).
|
||||||
|
|
||||||
|
Stack layout note: linear-memory data and float stacks are the
|
||||||
|
fallback used whenever the optimizer can't keep values in WASM
|
||||||
|
locals. After stack-to-local promotion, many words touch DSP
|
||||||
|
only on entry/exit.
|
||||||
|
|
||||||
|
|
||||||
3. SYSTEM VARIABLES (offsets from 0x0000)
|
3. SYSTEM VARIABLES (offsets from 0x0000)
|
||||||
@@ -113,60 +156,86 @@ WAFER Architecture Reference (updated 2026-04-13)
|
|||||||
8 >IN Parse offset into input buffer
|
8 >IN Parse offset into input buffer
|
||||||
12 HERE Next free dictionary address
|
12 HERE Next free dictionary address
|
||||||
16 LATEST Most recent dictionary entry addr
|
16 LATEST Most recent dictionary entry addr
|
||||||
20 SOURCE-ID 0=user input, -1=string
|
20 SOURCE-ID 0=user input, -1=string, fileid>0
|
||||||
24 #TIB Length of current input
|
24 #TIB Length of current input
|
||||||
28 HLD Pictured numeric output pointer
|
28 HLD Pictured numeric output pointer
|
||||||
32 LEAVE-FLAG Nonzero when LEAVE called in loop
|
32 LEAVE-FLAG Nonzero when LEAVE called in loop
|
||||||
|
|
||||||
|
|
||||||
4. DICTIONARY ENTRY FORMAT
|
4. DICTIONARY (dictionary.rs)
|
||||||
--------------------------
|
-----------------------------
|
||||||
|
|
||||||
+--------+-------+----------+---------+-----------+
|
Entry layout in linear memory:
|
||||||
| Link | Flags | Name | Padding | Code |
|
|
||||||
| 4 bytes| 1 byte| N bytes | 0-3 B | 4 bytes |
|
+--------+-------+----------+---------+-----------+----------+
|
||||||
+--------+-------+----------+---------+-----------+
|
| Link | Flags | Name | Padding | Code | Param |
|
||||||
|
| 4 B | 1 B | N B | 0-3 B | 4 B | optional |
|
||||||
|
+--------+-------+----------+---------+-----------+----------+
|
||||||
^ ^
|
^ ^
|
||||||
entry_addr code field (fn table index)
|
entry_addr code field (fn-table idx)
|
||||||
|
|
||||||
Flags byte:
|
Flags byte:
|
||||||
Bit 7 (0x80): IMMEDIATE
|
Bit 7 (0x80): IMMEDIATE
|
||||||
Bit 6 (0x40): HIDDEN (during compilation)
|
Bit 6 (0x40): HIDDEN (during compilation)
|
||||||
Bits 0-4 (0x1F): name length (max 31)
|
Bits 0-4 : name length (max 31)
|
||||||
|
|
||||||
Link points to previous entry (0 = end of list).
|
Link points to previous entry (0 = end of list).
|
||||||
Name stored uppercase, padded to 4-byte alignment.
|
Name stored uppercase, padded to 4-byte alignment.
|
||||||
Code field: index into WASM function table.
|
Code field: index into shared WASM function table.
|
||||||
Parameter field (if any) follows immediately after code field.
|
Parameter field follows the code field for CREATE'd /
|
||||||
|
DOES> / VARIABLE / CONSTANT bodies.
|
||||||
|
|
||||||
|
Lookup is NOT linear: dictionary.rs maintains a HashMap
|
||||||
|
index from name -> Vec<(wid, addr, fn_index, immediate)>.
|
||||||
|
Each entry is tagged with its wordlist id; resolution
|
||||||
|
walks the current search order.
|
||||||
|
|
||||||
|
Wordlists / Search-Order:
|
||||||
|
wordlist ids are u32; the FORTH wordlist is id 1.
|
||||||
|
`current_wid` selects where new definitions land;
|
||||||
|
`search_order` is the lookup chain (top first).
|
||||||
|
Implements the Forth-2012 Search-Order word set.
|
||||||
|
|
||||||
|
|
||||||
5. THREE TYPES OF WORDS
|
5. WORD CATEGORIES
|
||||||
-----------------------
|
------------------
|
||||||
|
|
||||||
a) IR Primitives (compiled to WASM)
|
a) IR Primitives — register_primitive("DUP", false, vec![IrOp::Dup])
|
||||||
register_primitive("DUP", false, vec![IrOp::Dup])
|
|
||||||
- Body stored as Vec<IrOp>
|
- Body stored as Vec<IrOp>
|
||||||
- Optimized, then compiled to WASM module
|
- Optimized, then compiled to WASM
|
||||||
- Inlineable by optimizer
|
- Inlineable by optimizer
|
||||||
- FAST: no function call overhead when inlined
|
- Batched at boot: ~110 primitive registrations compiled
|
||||||
|
into a single WASM module to amortize instantiation cost
|
||||||
|
|
||||||
b) Host Functions (HostFn closures)
|
b) Host Functions — register_host_primitive(".", false, func)
|
||||||
register_host_primitive(".", false, func)
|
- HostFn = Box<dyn Fn(&mut dyn HostAccess)
|
||||||
- HostFn = Box<dyn Fn(&mut dyn HostAccess) -> Result<()>>
|
-> Result<()> + Send + Sync>
|
||||||
- Access memory/globals via HostAccess trait (runtime-agnostic)
|
- Access memory/globals via HostAccess trait
|
||||||
- NOT inlineable
|
- NOT inlineable
|
||||||
- Used for: I/O, dictionary manipulation, complex logic
|
- Used for I/O, dictionary manipulation, complex stack ops
|
||||||
- Same closure works on NativeRuntime and WebRuntime
|
- Same closure runs on NativeRuntime and WebRuntime
|
||||||
|
|
||||||
c) Forth-defined words
|
c) Forth-defined words — `: SQUARE DUP * ;`
|
||||||
: SQUARE DUP * ;
|
- Compiled by the outer interpreter
|
||||||
- Compiled by outer interpreter
|
- Goes through the full optimize -> codegen pipeline
|
||||||
- Goes through full optimize -> codegen pipeline
|
- Stored in `ir_bodies` for future inlining
|
||||||
- Stored in ir_bodies for future inlining
|
|
||||||
|
d) Special interpreter tokens (immediate, with custom parsing)
|
||||||
|
- Defining words: CREATE, VARIABLE, CONSTANT, :, ;, DOES>
|
||||||
|
- String literals: S", ."
|
||||||
|
- Control structures: IF/ELSE/THEN, BEGIN/UNTIL/WHILE/REPEAT,
|
||||||
|
DO/?DO/LOOP/+LOOP, [: ... ;] quotations, {: ... :} locals
|
||||||
|
- CONSOLIDATE
|
||||||
|
Their body-collection / dictionary-side-effect logic lives
|
||||||
|
directly in compile_token / interpret_token_immediate.
|
||||||
|
They still emit IR ops (e.g. IrOp::If, IrOp::DoLoop,
|
||||||
|
IrOp::ForthLocalGet) — the difference is that they are NOT
|
||||||
|
registered via register_primitive; the outer interpreter
|
||||||
|
handles them as special syntax.
|
||||||
|
|
||||||
|
|
||||||
6. WASM MODULE STRUCTURE (per word)
|
6. WASM MODULE STRUCTURE (per JIT-compiled word)
|
||||||
-----------------------------------
|
------------------------------------------------
|
||||||
|
|
||||||
Imports (6) — provided by Runtime impl:
|
Imports (6) — provided by Runtime impl:
|
||||||
0. emit (func: i32 -> void) Character output callback
|
0. emit (func: i32 -> void) Character output callback
|
||||||
@@ -176,25 +245,59 @@ WAFER Architecture Reference (updated 2026-04-13)
|
|||||||
4. fsp (global: mut i32) Float stack pointer
|
4. fsp (global: mut i32) Float stack pointer
|
||||||
5. table (table: funcref) Shared function table
|
5. table (table: funcref) Shared function table
|
||||||
|
|
||||||
Types (2):
|
Types: () -> () for word bodies; (i32) -> () for emit.
|
||||||
0. void: () -> ()
|
|
||||||
1. i32: (i32) -> ()
|
|
||||||
|
|
||||||
Functions (1):
|
Functions (1):
|
||||||
The compiled word body
|
The compiled word body, typed () -> ().
|
||||||
|
|
||||||
Element section:
|
Element section:
|
||||||
table[base_fn_index] = function 1
|
table[base_fn_index] = function 1
|
||||||
|
|
||||||
Runtime::instantiate_and_install(wasm_bytes, fn_index):
|
Runtime::instantiate_and_install(wasm_bytes, fn_index):
|
||||||
- NativeRuntime: Module::new + Instance::new with 6 wasmtime imports
|
- NativeRuntime: wasmtime Module::new + Instance::new
|
||||||
- WebRuntime: WebAssembly.instantiate with JS import objects
|
with the 6 imports above
|
||||||
|
- WebRuntime: WebAssembly.instantiate with JS import
|
||||||
|
objects pulled from the shared WaferRepl state
|
||||||
|
|
||||||
|
|
||||||
7. OPTIMIZATION PASSES (detail)
|
7. IR OPS (ir.rs — IrOp enum)
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
Stack: Drop, Dup, Swap, Over, Rot, Nip, Tuck,
|
||||||
|
TwoDup, TwoDrop
|
||||||
|
Literals: PushI32, PushI64, PushF64
|
||||||
|
Arithmetic: Add, Sub, Mul, DivMod, Negate, Abs
|
||||||
|
Compare: Eq, NotEq, Lt, Gt, LtUnsigned,
|
||||||
|
ZeroEq, ZeroLt
|
||||||
|
Logic: And, Or, Xor, Invert,
|
||||||
|
Lshift, Rshift, ArithRshift
|
||||||
|
Memory: Fetch, Store, CFetch, CStore, PlusStore
|
||||||
|
Control: Call, TailCall, Exit,
|
||||||
|
If{then, else?},
|
||||||
|
DoLoop{body, is_plus_loop},
|
||||||
|
BeginUntil, BeginAgain,
|
||||||
|
BeginWhileRepeat,
|
||||||
|
BeginDoubleWhileRepeat,
|
||||||
|
LoopRestartIfFalse,
|
||||||
|
Block(label), BranchIfFalse(label),
|
||||||
|
EndBlock(label) -- for CS-ROLL'd patterns
|
||||||
|
Return stack: ToR, FromR, RFetch, LoopJ
|
||||||
|
Forth locals: ForthLocalGet/Set,
|
||||||
|
ForthFLocalGet/Set
|
||||||
|
I/O: Emit, Dot, Cr, Type
|
||||||
|
System: Execute, SpFetch
|
||||||
|
Float stack: FDup, FDrop, FSwap, FOver
|
||||||
|
Float math: FAdd, FSub, FMul, FDiv, FNegate, FAbs,
|
||||||
|
FSqrt, FMin, FMax, FFloor, FRound
|
||||||
|
Float compare:FZeroEq, FZeroLt, FEq, FLt
|
||||||
|
Float memory: FetchFloat, StoreFloat
|
||||||
|
Conversion: StoF, FtoS
|
||||||
|
|
||||||
|
|
||||||
|
8. OPTIMIZATION PASSES (detail)
|
||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
||||||
PEEPHOLE (runs 5x across full pipeline):
|
PEEPHOLE (5x across pipeline):
|
||||||
PushI32(n), Drop -> (removed) Unused literal
|
PushI32(n), Drop -> (removed) Unused literal
|
||||||
Dup, Drop -> (removed) Redundant copy
|
Dup, Drop -> (removed) Redundant copy
|
||||||
Swap, Swap -> (removed) Self-inverse
|
Swap, Swap -> (removed) Self-inverse
|
||||||
@@ -205,16 +308,17 @@ WAFER Architecture Reference (updated 2026-04-13)
|
|||||||
PushI32(1), Mul -> (removed) Identity
|
PushI32(1), Mul -> (removed) Identity
|
||||||
Over, Over -> TwoDup Combine
|
Over, Over -> TwoDup Combine
|
||||||
Drop, Drop -> TwoDrop Combine
|
Drop, Drop -> TwoDrop Combine
|
||||||
(+ float variants: PushF64/FDrop, FDup/FDrop, FSwap/FSwap, FNegate/FNegate)
|
Float variants:
|
||||||
|
PushF64(_), FDrop / FDup, FDrop /
|
||||||
|
FSwap, FSwap / FNegate, FNegate
|
||||||
|
|
||||||
CONSTANT FOLD:
|
CONSTANT FOLD:
|
||||||
Binary: PushI32(a), PushI32(b), <op> -> PushI32(result)
|
Binary i32: PushI32(a), PushI32(b), <op> -> PushI32(r)
|
||||||
Supports: Add, Sub, Mul, And, Or, Xor, Lshift, Rshift, ArithRshift,
|
Add, Sub, Mul, And, Or, Xor,
|
||||||
|
Lshift, Rshift, ArithRshift,
|
||||||
Eq, NotEq, Lt, Gt, LtUnsigned
|
Eq, NotEq, Lt, Gt, LtUnsigned
|
||||||
Unary: PushI32(n), <op> -> PushI32(result)
|
Unary i32: Negate, Abs, Invert, ZeroEq, ZeroLt
|
||||||
Supports: Negate, Abs, Invert, ZeroEq, ZeroLt
|
Float binary/unary equivalents on PushF64.
|
||||||
Float binary: PushF64(a), PushF64(b), <op> -> PushF64(result)
|
|
||||||
Float unary: PushF64(n), <op> -> PushF64(result)
|
|
||||||
|
|
||||||
STRENGTH REDUCE:
|
STRENGTH REDUCE:
|
||||||
PushI32(2^n), Mul -> PushI32(n), Lshift
|
PushI32(2^n), Mul -> PushI32(n), Lshift
|
||||||
@@ -226,81 +330,149 @@ WAFER Architecture Reference (updated 2026-04-13)
|
|||||||
PushI32(0), If{then,else} -> else_body only
|
PushI32(0), If{then,else} -> else_body only
|
||||||
Everything after Exit -> removed
|
Everything after Exit -> removed
|
||||||
|
|
||||||
INLINE (max_size=8, single pass):
|
INLINE (max 8 ops, single pass):
|
||||||
Call(id) -> inline body if:
|
Call(id) -> body if all of:
|
||||||
- Body length <= 8 ops
|
- body length <= 8 ops
|
||||||
- No self-recursion
|
- no self-recursion
|
||||||
- No Exit (would return from caller)
|
- no Exit (would return from caller)
|
||||||
- No ForthLocalGet/Set (would collide with caller's locals)
|
- no ForthLocalGet/Set (would collide with caller locals)
|
||||||
TailCall -> Call when inlined (no longer tail position)
|
TailCall -> Call when inlined (no longer tail position)
|
||||||
|
|
||||||
TAIL CALL (last pass):
|
TAIL CALL (last pass, must be last):
|
||||||
Last Call(id) -> TailCall(id) if:
|
trailing Call(id) -> TailCall(id) if return stack balanced
|
||||||
- Return stack balanced (equal ToR and FromR)
|
(equal ToR / FromR pairs).
|
||||||
Recurses into If branches for conditional tail calls
|
Recurses into If branches for conditional tail calls.
|
||||||
|
|
||||||
|
STACK-TO-LOCAL PROMOTION (codegen pass, not optimizer):
|
||||||
|
Words whose effects on the data stack can be statically
|
||||||
|
tracked are compiled to use WASM locals 1..s instead of
|
||||||
|
DSP loads/stores. Triggered by `is_promotable(body)`.
|
||||||
|
DSP is still written back before any Call so callees and
|
||||||
|
host functions see a consistent stack.
|
||||||
|
|
||||||
|
|
||||||
8. CONSOLIDATION
|
9. CONSOLIDATION (consolidate.rs + codegen.rs)
|
||||||
----------------
|
----------------------------------------------
|
||||||
|
|
||||||
CONSOLIDATE word recompiles all JIT-compiled words into a
|
CONSOLIDATE recompiles every JIT-compiled word into ONE WASM
|
||||||
single WASM module:
|
module:
|
||||||
- All call_indirect -> direct call (for words in module)
|
- All call_indirect to consolidated words become direct
|
||||||
- External calls (host functions) remain call_indirect
|
`call` (single-module direct calls)
|
||||||
- Maximum performance for final program
|
- External calls (host functions) stay call_indirect
|
||||||
|
- Removes per-word instantiation overhead and lets the
|
||||||
|
WASM engine inline / specialize across word boundaries
|
||||||
|
|
||||||
Two-part implementation:
|
Two parts:
|
||||||
codegen::compile_consolidated_module() - builds multi-function module
|
codegen::compile_consolidated_module()
|
||||||
outer::ForthVM::consolidate() - orchestrates collection + table update
|
Builds the multi-function module.
|
||||||
|
outer::ForthVM::consolidate()
|
||||||
|
Collects ir_bodies, computes table layout, compiles,
|
||||||
|
instantiates, and patches the shared function table.
|
||||||
|
|
||||||
|
|
||||||
9. EXPORT PIPELINE (wafer build)
|
10. EXPORT PIPELINE (`wafer build`)
|
||||||
--------------------------------
|
----------------------------------
|
||||||
|
|
||||||
1. Evaluate source file with recording_toplevel=true
|
export.rs::export_module() steps:
|
||||||
2. Collect all IR words + top-level IR
|
1. Evaluate the source file with recording_toplevel = true
|
||||||
3. Determine entry: --entry flag > MAIN word > top-level execution
|
2. Collect every IR word + recorded top-level IR
|
||||||
4. Build consolidated module with data section (memory snapshot)
|
3. Resolve entry point (priority):
|
||||||
5. Embed metadata in "wafer" custom section (JSON)
|
--entry <name> > MAIN > synthetic _start from the
|
||||||
6. Optional: --js generates JS loader + HTML page
|
recorded top-level
|
||||||
7. Optional: --native AOT-compiles and appends to wafer binary
|
4. Snapshot WASM linear memory (system vars + dictionary +
|
||||||
Format: [wafer binary][precompiled WASM][metadata][trailer]
|
any user data)
|
||||||
Trailer: payload_len(8) + metadata_len(8) + "WAFEREXE"(8)
|
5. Walk the IR, find every Call/TailCall to a host word
|
||||||
|
not in the consolidated set: those become required
|
||||||
|
imports of the exported module
|
||||||
|
6. Build metadata (JSON, custom "wafer" section):
|
||||||
|
version, entry_table_index, host_functions,
|
||||||
|
memory_size, dsp/rsp/fsp_init
|
||||||
|
7. compile_exportable_module() emits the final WASM with
|
||||||
|
a passive data section seeded from the memory snapshot
|
||||||
|
8. Optional --js: also emit a JS loader + minimal HTML
|
||||||
|
9. Optional --native: AOT-compile and append to the wafer
|
||||||
|
binary itself, in this layout:
|
||||||
|
[wafer ELF/Mach-O][precompiled WASM][metadata]
|
||||||
|
[trailer: payload_len(8) | metadata_len(8) | "WAFEREXE"]
|
||||||
|
The CLI detects the trailer at startup and runs the
|
||||||
|
embedded payload directly (single-file distribution).
|
||||||
|
|
||||||
|
|
||||||
10. CRATE STRUCTURE
|
11. CRATE STRUCTURE
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
crates/
|
crates/
|
||||||
core/ wafer-core: compiler, optimizer, codegen, dictionary, Runtime trait
|
core/ wafer-core: compiler, optimizer, codegen,
|
||||||
Feature flags: default=["native"], "native" enables wasmtime
|
dictionary, runtime trait, outer interpreter.
|
||||||
Without features: pure Rust (dictionary, IR, optimizer, codegen, outer)
|
Largest file: codegen.rs (~4.3k LOC).
|
||||||
cli/ wafer: CLI REPL (rustyline), wafer build/run commands
|
Feature flags:
|
||||||
web/ wafer-web: browser REPL (wasm-bindgen + WebRuntime + HTML/CSS/JS)
|
default = ["native"]
|
||||||
|
"native" pulls in wasmtime + NativeRuntime +
|
||||||
|
runner.rs (CLI executor) + export.rs
|
||||||
|
"crypto" enables SHA1/256/512 host words
|
||||||
|
No features: pure-Rust core for wafer-web
|
||||||
|
(dictionary, IR, optimizer, codegen,
|
||||||
|
outer interpreter only)
|
||||||
|
cli/ wafer: rustyline REPL + `wafer build` / `wafer run`
|
||||||
|
web/ wafer-web: browser REPL.
|
||||||
|
|
||||||
Key web files:
|
Key web files:
|
||||||
crates/web/src/lib.rs WaferRepl wasm-bindgen entry point
|
crates/web/src/lib.rs WaferRepl wasm-bindgen entry
|
||||||
crates/web/src/runtime_web.rs WebRuntime: js_sys WebAssembly API
|
crates/web/src/runtime_web.rs WebRuntime: js_sys WebAssembly
|
||||||
crates/web/www/app.js Frontend JS (terminal emulation)
|
crates/web/www/app.js Frontend (terminal emulation)
|
||||||
crates/web/www/index.html HTML shell
|
crates/web/www/index.html HTML shell
|
||||||
crates/web/www/style.css Styling
|
crates/web/www/style.css Styling
|
||||||
|
crates/web/www/pkg/ wasm-pack output (gitignored)
|
||||||
|
|
||||||
|
|
||||||
11. BOOT SEQUENCE
|
12. BOOT SEQUENCE
|
||||||
-----------------
|
-----------------
|
||||||
|
|
||||||
ForthVM::<R>::new() ->
|
ForthVM::<R>::new() ->
|
||||||
1. R::new() — create runtime (wasmtime or browser WASM)
|
1. R::new() — create runtime (wasmtime or browser WASM)
|
||||||
2. register_primitives() in batch_mode:
|
2. register_primitives() in batch_mode = true:
|
||||||
- ~40 IR primitives (DUP, +, @, etc.)
|
- ~110 IR primitive registrations (DUP, +, @, ...)
|
||||||
- ~60 host functions (., .S, M*, ACCEPT, etc.)
|
- ~87 host primitive registrations (., .S, M*, ACCEPT, ...)
|
||||||
- ~30 special words (IF, DO, :, VARIABLE, etc.)
|
- special interpreter tokens (IF, DO, :, VARIABLE, S",
|
||||||
3. compile_batch() - single WASM module for all IR primitives
|
{: :}, [: ;], CONSOLIDATE, ...) handled directly in
|
||||||
4. Load boot.fth - Forth replaces Rust host functions:
|
interpret_token_immediate / compile_token, no IR op
|
||||||
Phase 1: Stack/memory (DEPTH, PICK, 2OVER, FILL, MOVE)
|
3. Word-set registrations:
|
||||||
Phase 2: Double-cell arithmetic (D+, DNEGATE, D<)
|
core, double, exception, facility, file (subset),
|
||||||
Phase 3: Mixed arithmetic (SM/REM, FM/MOD, */, */MOD)
|
floating-point, locals, memory, search-order,
|
||||||
Phase 4: HERE, ALLOT, comma, ALIGN
|
programming-tools, string, optional crypto
|
||||||
Phase 5: I/O, pictured numeric output (., U., TYPE, <# # #>)
|
4. batch_compile_deferred() — single WASM module for all
|
||||||
Phase 6: DEFER support
|
deferred IR primitives
|
||||||
Phase 7: String operations (COMPARE, SOURCE, FALIGNED)
|
5. Load boot.fth (include_str!), evaluated line by line so
|
||||||
|
`\` comments terminate at end-of-line:
|
||||||
|
Phase 1: stack/memory (DEPTH, PICK, 2OVER, FILL, MOVE,
|
||||||
|
CMOVE, /STRING, -TRAILING)
|
||||||
|
Phase 2: double-cell arithmetic (D+, DNEGATE, D<, D=)
|
||||||
|
Phase 3: mixed arithmetic (SM/REM, FM/MOD, */, */MOD)
|
||||||
|
Phase 4: HERE, ALLOT, comma, ALIGN, ALIGNED
|
||||||
|
Phase 5: I/O + pictured output (., U., TYPE, <# # #>,
|
||||||
|
SIGN, HOLD)
|
||||||
|
Phase 6: DEFER support (DEFER, IS, ACTION-OF)
|
||||||
|
Phase 7: more replacements (COMPARE, SOURCE, FALIGNED,
|
||||||
|
DFALIGN, structures, S" hint, ...)
|
||||||
|
|
||||||
|
|
||||||
|
13. RUNTIME-VS-EXPORT NOTE
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
Two separate codegen entry points produce multi-function
|
||||||
|
WASM modules from the same IR:
|
||||||
|
|
||||||
|
compile_consolidated_module() used by CONSOLIDATE
|
||||||
|
- Targets the live runtime
|
||||||
|
- Re-uses the shared globals/table/memory imports
|
||||||
|
- External calls remain call_indirect
|
||||||
|
|
||||||
|
compile_exportable_module() used by `wafer build`
|
||||||
|
- Targets a standalone module
|
||||||
|
- Carries its own memory (passive data section seeded
|
||||||
|
from the snapshot) and embeds metadata
|
||||||
|
- Required host functions become imports the runner
|
||||||
|
(or AOT loader) must satisfy
|
||||||
|
|
||||||
|
Both share the same per-IrOp lowering helpers; the
|
||||||
|
difference is in module-level wiring.
|
||||||
|
|||||||
Reference in New Issue
Block a user