Implement complete Floating-Point word set, 70+ float words

Separate float stack with fsp global, IEEE 754 double precision. Stack ops: FDROP FDUP FSWAP FOVER FROT FDEPTH Arithmetic: F+ F- F* F/ FNEGATE FABS FMAX FMIN FSQRT FLOOR FROUND F** Comparisons: F0= F0< F= F< F~ Memory: F@ F! SF@ SF! DF@ DF! FLOAT+ FLOATS FALIGNED FALIGN Conversions: D>F F>D S>F F>S Trig: FSIN FCOS FTAN FASIN FACOS FATAN FATAN2 FSINCOS Exp/Log: FEXP FEXPM1 FLN FLNP1 FLOG FALOG Hyperbolic: FSINH FCOSH FTANH FASINH FACOSH FATANH I/O: F. FE. FS. REPRESENT >FLOAT PRECISION SET-PRECISION Defining: FVARIABLE FCONSTANT FVALUE FLITERAL Float literal parsing (1E, 1.5E2, -3.14E0 format) 299 unit tests + 11 compliance tests, 0 errors on float test suite
2026-04-01 20:38:48 +02:00
parent 3e7f92b7ef
commit eb79c40c69
4 changed files with 2870 additions and 20 deletions
@@ -29,6 +29,10 @@ const DSP: u32 = 0;
 /// Index of the `$rsp` global (return stack pointer).
 const RSP: u32 = 1;

+/// Index of the `$fsp` global (float stack pointer).
+#[allow(dead_code)]
+const FSP: u32 = 2;
+
 /// Index of the imported function table.
 const TABLE: u32 = 0;

@@ -795,6 +799,15 @@ pub fn compile_word(
            shared: false,
        }),
    );
+    imports.import(
+        "env",
+        "fsp",
+        EntityType::Global(GlobalType {
+            val_type: ValType::I32,
+            mutable: true,
+            shared: false,
+        }),
+    );
    imports.import(
        "env",
        "table",
@@ -871,7 +884,7 @@ mod tests {
    use super::*;
    use crate::dictionary::WordId;
    use crate::ir::IrOp;
-    use crate::memory::{DATA_STACK_TOP, RETURN_STACK_TOP};
+    use crate::memory::{DATA_STACK_TOP, FLOAT_STACK_TOP, RETURN_STACK_TOP};

    fn default_config() -> CodegenConfig {
        CodegenConfig {
@@ -1133,6 +1146,13 @@ mod tests {
        )
        .unwrap();

+        let fsp = Global::new(
+            &mut store,
+            wasmtime::GlobalType::new(ValType::I32, Mutability::Var),
+            Val::I32(FLOAT_STACK_TOP as i32),
+        )
+        .unwrap();
+
        let table = Table::new(
            &mut store,
            wasmtime::TableType::new(RefType::FUNCREF, 16, None),
@@ -1152,6 +1172,7 @@ mod tests {
                memory.into(),
                dsp.into(),
                rsp.into(),
+                fsp.into(),
                table.into(),
            ],
        )
@@ -111,6 +111,12 @@ impl Dictionary {
        Ok(WordId(fn_index))
    }

+    /// Reserve a function index without creating a dictionary entry.
+    /// Used for anonymous host functions (e.g., float literals during compilation).
+    pub fn reserve_fn_index(&mut self) {
+        self.next_fn_index += 1;
+    }
+
    /// Reveal the most recent word (remove HIDDEN flag).
    /// Called after `: ... ;` completes compilation.
    pub fn reveal(&mut self) {
@@ -0,0 +1,890 @@
+# The Unreasonable Effectiveness of Stack Machines
+
+_How Forth — and WAFER — can serve as infrastructure for data analytics,
+databases, AI inference, AI code generation, and AI agent control._
+
+---
+
+Forth is 55 years old. It has no type system, no garbage collector, no package
+manager, no syntax to speak of. By most conventional measures, it shouldn't
+still be relevant.
+
+But it keeps showing up at the edges — in firmware, in space probes, in
+real-time systems, in places where correctness and determinism matter more than
+developer ergonomics. That's worth paying attention to.
+
+The properties that make Forth unusual — concatenative composition, zero-cost
+abstraction through word definition, a stack-based execution model that maps
+directly to hardware — happen to line up surprisingly well with what five of
+the most active areas in modern computing are independently reaching for:
+
+1. **Data analytics** wants composable, streaming pipelines.
+2. **Database engines** want stack-based virtual machines for query execution.
+3. **AI inference** wants tiny, deterministic, embeddable runtimes.
+4. **AI code generation** wants the smallest possible target language.
+5. **AI agent systems** want plans that are also executable programs.
+
+Forth won't single-handedly solve any of these. But it offers a useful lens
+for understanding what each of them actually needs — and WAFER, a Forth that
+compiles to WebAssembly, is in a good position to explore that space.
+
+WAFER (WebAssembly Forth Engine in Rust) JIT-compiles each Forth word to its
+own WASM module, linked through shared linear memory, globals, and a function
+table. It runs anywhere WASM runs: browsers, edge devices, servers, embedded
+systems. It has 160+ words, 100% Forth 2012 compliance on 10 word sets, and
+fits in ~50 KB. It has exception handling (`CATCH`/`THROW`), metaprogramming
+(`DOES>`), dynamic compilation (`EVALUATE`), and an optimization pipeline
+designed for stack-to-local promotion that can achieve 7x speedups.
+
+This document explores what becomes possible when you take these properties
+seriously.
+
+---
+
+## 1. Data Analytics: Pipelines Without Plumbing
+
+### The Problem with Pipelines
+
+Every data analytics framework reinvents the same idea: take data, push it
+through a sequence of transformations, collect the result. Pandas chains
+methods. Spark builds DAGs. dplyr pipes with `%>%`. Unix pipes bytes through
+`|`. They all converge on the same shape: **linear composition of operations
+on an implicit data flow**.
+
+This is exactly what Forth does. It has done it since 1970. The data stack
+_is_ the pipeline. Each word _is_ a transformation. Composition is
+juxtaposition — you don't pipe, you don't chain, you don't bind. You just
+write the words next to each other.
+
+```forth
+\ Pandas: df['amount'].where(df['amount'] > 0).mean()
+\ Forth:
+: POSITIVE?  ( n -- n flag )  DUP 0> ;
+: FILTER-POSITIVE  ( addr n -- addr' n' )
+    0 >R  0 >R   \ count and sum accumulators on return stack
+    0 DO
+        DUP I CELLS + @
+        POSITIVE? IF  R> + >R  R> 1+ >R  THEN
+    LOOP DROP
+    R> R>  \ ( sum count )
+;
+: MEAN  ( sum count -- avg )  / ;
+
+data 100 FILTER-POSITIVE MEAN .
+```
+
+This goes a bit deeper than syntactic sugar. The absence of intermediate
+variables is a structural property. In a Pandas chain, every `.method()`
+returns a new DataFrame object that must be allocated, tracked, and eventually
+collected. In Forth, the data flows through the stack with zero allocation.
+The pipeline _is_ the execution.
+
+### Streaming and Incremental Computation
+
+The stack model is inherently streaming. A word consumes its inputs and
+produces its outputs in the same motion. There is no "collect all data first,
+then process" step unless you explicitly build one. This makes Forth natural
+for:
+
+- **Event stream processing**: each event lands on the stack, a word
+  processes it, the result is consumed by the next word.
+- **Incremental aggregation**: running sums, counts, and statistics
+  maintained on the return stack across invocations.
+- **Windowed computation**: a circular buffer in linear memory with
+  stack-based access patterns.
+
+```forth
+\ Running average over a stream of values
+VARIABLE running-sum
+VARIABLE running-count
+
+: UPDATE-AVG  ( new-value -- running-avg )
+    running-sum @ +  DUP running-sum !
+    running-count @ 1+  DUP running-count !
+    /
+;
+
+\ Each incoming value:
+42 UPDATE-AVG .    \ prints running average after adding 42
+17 UPDATE-AVG .    \ prints updated average after adding 17
+```
+
+### Client-Side Analytics via WASM
+
+WAFER compiles to WebAssembly. This means analytics can run _in the browser_
+with no server round-trips. A user uploads a CSV, WAFER parses and processes
+it entirely client-side, and the results render immediately. No data leaves
+the machine. No API calls. No latency.
+
+This isn't just a nice demo. For privacy-sensitive analytics (healthcare,
+finance, GDPR-regulated data), client-side processing can be a compliance
+requirement, not just a nice-to-have. WAFER's deterministic execution (no GC
+pauses, no background threads, fixed memory layout) makes it predictable
+enough for real-time dashboards.
+
+### Domain-Specific Languages
+
+Forth's defining feature is that you build the language up to your problem.
+An analytics team doesn't write Forth — they write _their DSL_, which
+happens to be implemented in Forth:
+
+```forth
+\ Define a mini analytics vocabulary
+: COLUMN  ( col# -- addr n )  table-base SWAP col-offset + col-length ;
+: SUM     ( addr n -- total )  0 ROT ROT 0 DO  OVER I CELLS + @ +  LOOP NIP ;
+: COUNT   ( addr n -- n )      NIP ;
+: AVG     ( addr n -- avg )    2DUP SUM -ROT COUNT / ;
+: WHERE>  ( addr n thresh -- addr' n' )  filter-gt ;
+
+\ The analyst writes:
+3 COLUMN  1000 WHERE>  AVG .
+\ "Average of column 3 where values exceed 1000"
+```
+
+The DSL compiles to WASM through WAFER's IR pipeline. There is no
+interpreter overhead at query time. The analyst's vocabulary _is_ the
+optimized code.
+
+### A Different Way to Look at It
+
+Most languages treat the absence of named variables as a limitation. But in
+data pipelines, it can actually be a **feature**. Named intermediates create
+coupling points — places where code can refer to stale state, where
+refactoring requires renaming, where parallelization requires dependency
+analysis. Point-free composition through a stack sidesteps this whole class
+of problems. The data is always _here_, on top of the stack, ready for the
+next transformation.
+
+---
+
+## 2. Database Engine: The Query VM You Already Have
+
+### Databases Already Think in Stacks
+
+SQLite — the most deployed database engine in the world — executes queries
+through the VDBE (Virtual Database Engine), a stack-based bytecode virtual
+machine. When you write `SELECT * FROM users WHERE age > 30`, SQLite's query
+planner compiles it into a sequence of stack operations: open cursor, seek,
+compare, jump, emit row.
+
+PostgreSQL's executor runs a tree of plan nodes, each of which pushes tuples
+upward. MySQL's handler interface is a stack of operations. CockroachDB
+compiles SQL to a vectorized execution engine that operates on batches — but
+the control flow is still a stack of operators.
+
+There's a pattern here: **query execution engines tend to converge on
+stack machines**. Forth just happens to already be one, with no extra
+abstraction layers in between.
+
+### Query Plans as Forth Programs
+
+A SQL query plan is a tree. Flattened into execution order, it becomes a
+sequence of operations — which is exactly a Forth program:
+
+```sql
+SELECT name, salary FROM employees WHERE dept = 'ENG' AND salary > 100000;
+```
+
+The query plan, expressed as Forth:
+
+```forth
+\ Primitives provided by the storage engine
+\ SCAN        ( table -- cursor )
+\ NEXT-ROW    ( cursor -- cursor flag )  flag=true if row available
+\ COL@        ( cursor col# -- value )
+\ EMIT-ROW    ( v1 v2 -- )              send to result set
+\ CLOSE       ( cursor -- )
+
+: MATCH-DEPT?   ( cursor -- cursor flag )  DUP 2 COL@ S" ENG" COMPARE 0= ;
+: MATCH-SAL?    ( cursor -- cursor flag )  DUP 3 COL@ 100000 > ;
+: PROJECT       ( cursor -- )             DUP 0 COL@  OVER 3 COL@  EMIT-ROW ;
+
+: QUERY  ( -- )
+    employees SCAN
+    BEGIN
+        NEXT-ROW
+    WHILE
+        MATCH-DEPT? IF
+            MATCH-SAL? IF
+                PROJECT
+            THEN
+        THEN
+    REPEAT
+    CLOSE
+;
+```
+
+This isn't just pseudocode, either. Every word here could be a real WAFER
+word backed by storage primitives implemented as host functions. The query
+compiles through WAFER's IR pipeline to native WASM, with the same
+optimization opportunities as any other Forth word: inlining, constant
+folding, dead code elimination.
+
+### EVALUATE as Dynamic Query Compilation
+
+SQL databases accept queries as strings and compile them at runtime. Forth
+has `EVALUATE`, which does exactly the same thing — takes a string and
+compiles/executes it:
+
+```forth
+\ Build a query string dynamically
+S" employees SCAN BEGIN NEXT-ROW WHILE MATCH-DEPT? IF PROJECT THEN REPEAT CLOSE"
+EVALUATE
+```
+
+The difference from SQL: the "query language" and the "implementation
+language" are the same. There is no impedance mismatch between the language
+the user writes queries in and the language the engine executes them in. A
+user-defined function is just another word. An index lookup is just another
+word. A join strategy is just another word. They all compose the same way.
+
+### Linear Memory as Storage Pages
+
+WAFER's linear memory model maps directly to how databases manage storage.
+A database page is a fixed-size block of bytes at a known offset — exactly
+what Forth's `@` and `!` operate on. B-tree nodes are structures in linear
+memory traversed by pointer arithmetic:
+
+```forth
+\ B-tree node layout:
+\   +0: key count (cell)
+\   +4: is-leaf flag (cell)
+\   +8: keys array (key-count cells)
+\   +8+4*key-count: child pointers (key-count+1 cells)
+
+: NODE-KEYS    ( node -- addr )  8 + ;
+: NODE-KEY@    ( node i -- key )  CELLS SWAP NODE-KEYS + @ ;
+: NODE-CHILD@  ( node i -- child )
+    OVER NODE-KEYS
+    OVER @ CELLS +    \ skip past keys array
+    SWAP CELLS + 4 +  \ index into children
+    @
+;
+
+: BTREE-SEARCH  ( node target-key -- addr|0 )
+    OVER @ 0= IF  2DROP 0  EXIT  THEN  \ empty node
+    OVER 4 + @ IF                        \ leaf node
+        LEAF-SEARCH
+    ELSE
+        INTERNAL-SEARCH                  \ recurse into child
+    THEN
+;
+```
+
+### WASM Sandboxing for User-Defined Functions
+
+Safely executing user-defined functions (UDFs) is one of the trickier
+problems in database engines. PostgreSQL UDFs in C can crash the server.
+JavaScript UDFs require embedding V8. Python UDFs tend to be slow.
+
+WAFER UDFs compile to WASM and execute in a sandbox with bounded memory,
+bounded execution time, and no access to anything outside the linear memory
+they're given. A malicious UDF can't read other users' data, can't make
+network calls, can't crash the host. WAFER gets this for free — it's
+inherent to WASM's security model.
+
+```forth
+\ User defines a custom scoring function
+: SCORE  ( age salary -- score )
+    1000 /         \ salary contribution (salary/1000)
+    SWAP 50 - ABS  \ age penalty (distance from 50)
+    -              \ final score
+;
+
+\ Engine uses it in a query
+: RANKED-QUERY  ( -- )
+    employees SCAN
+    BEGIN NEXT-ROW WHILE
+        DUP 1 COL@  OVER 3 COL@  SCORE
+        50 > IF  PROJECT  THEN
+    REPEAT CLOSE
+;
+```
+
+The `SCORE` function compiles to a WASM module through WAFER's JIT. It runs
+at near-native speed, sandboxed, with no FFI overhead.
+
+### A Different Way to Look at It
+
+Database engineers put a lot of effort into building query VMs — designing
+bytecode formats, writing interpreters, adding JIT compilation. In a sense,
+they're often reinventing something Forth-shaped each time. It's worth asking:
+what if you just started with Forth and built the storage layer underneath it?
+
+---
+
+## 3. AI Inference: Neural Networks as Word Composition
+
+### Layers Are Words, Forward Pass Is Composition
+
+A neural network's forward pass is a pipeline: input tensor enters, passes
+through a sequence of layers (linear transform, activation, normalization),
+and a prediction exits. Each layer takes a tensor and produces a tensor.
+
+In Forth terms: each layer is a word. The tensor sits on the stack. The
+forward pass is the composition of those words:
+
+```forth
+\ Assuming tensor operations as primitives (host functions):
+\ T-MATMUL  ( tensor weights -- tensor )
+\ T-ADD     ( tensor bias -- tensor )
+\ T-RELU    ( tensor -- tensor )
+\ T-SOFTMAX ( tensor -- tensor )
+
+: LINEAR1    ( tensor -- tensor )  w1 T-MATMUL b1 T-ADD ;
+: LINEAR2    ( tensor -- tensor )  w2 T-MATMUL b2 T-ADD ;
+: LINEAR3    ( tensor -- tensor )  w3 T-MATMUL b3 T-ADD ;
+
+: CLASSIFIER ( tensor -- tensor )
+    LINEAR1 T-RELU
+    LINEAR2 T-RELU
+    LINEAR3 T-SOFTMAX
+;
+
+input-data CLASSIFIER  \ forward pass
+```
+
+This maps more directly than you might expect. The compositional structure of
+neural networks lines up nicely with the compositional structure of Forth
+programs. The stack carries the data flow. The words are the layers. The
+dictionary holds the model architecture.
+
+### Quantized Inference on the Integer Stack
+
+Most production inference runs quantized — INT8 or INT4 weights, integer
+arithmetic, no floating point. Forth's native data type is the integer cell.
+WAFER's `i32` stack operations map directly to quantized tensor operations:
+
+```forth
+\ INT8 quantized dot product of two vectors
+: QDOT  ( addr1 addr2 n -- result )
+    0 >R                         \ accumulator on return stack
+    0 DO
+        OVER I + C@  127 -       \ load and de-bias first element
+        OVER I + C@  127 -       \ load and de-bias second element
+        *  R> + >R               \ multiply-accumulate
+    LOOP
+    2DROP R>
+;
+
+\ Quantized linear layer
+: QLINEAR  ( input-addr weight-addr rows cols -- output-addr )
+    \ For each output neuron, compute QDOT with input
+    output-buf >R
+    0 DO
+        2DUP I row-offset + SWAP QDOT
+        R@ I CELLS + !
+    LOOP
+    2DROP R>
+;
+```
+
+No framework dependency, no Python interpreter, no CUDA runtime — just
+integer arithmetic on a stack, compiled to WASM, running on any device.
+
+### Edge AI: The 50 KB Runtime
+
+ML inference frameworks tend to be big. PyTorch is ~500 MB. TensorFlow Lite
+is ~1 MB for the runtime alone. ONNX Runtime is ~10 MB.
+
+WAFER is ~50 KB for the full Forth system. The model weights dominate the
+binary size, not the runtime. For edge devices — IoT sensors, wearables,
+microcontrollers, browser tabs — that size difference can be the difference
+between "fits" and "doesn't fit."
+
+WASM's portability means the same inference code runs on an ARM
+microcontroller, in a browser, on a server, without recompilation. Write the
+model once in Forth, deploy everywhere WASM reaches.
+
+### DOES> for Architecture Generation
+
+Forth's `DOES>` is a metaprogramming facility: it creates words that create
+other words, each with custom runtime behavior. This is exactly what neural
+architecture construction needs:
+
+```forth
+\ LAYER is a defining word that creates layer words
+: LAYER  ( weights bias rows cols -- )
+    CREATE  , , , ,           \ store dimensions and pointers
+    DOES>   ( tensor -- tensor )
+        DUP >R                \ save parameter field address
+        R@ @  R@ 4 + @       \ get cols, rows
+        R@ 8 + @              \ get weights address
+        T-MATMUL
+        R> 12 + @             \ get bias address
+        T-ADD
+;
+
+\ Define the network architecture
+w1 b1 768 512  LAYER EMBED
+w2 b2 512 256  LAYER HIDDEN1
+w3 b3 256 10   LAYER OUTPUT
+
+\ The architecture is now executable
+: MODEL  ( tensor -- tensor )  EMBED T-RELU HIDDEN1 T-RELU OUTPUT T-SOFTMAX ;
+```
+
+Each `LAYER` invocation creates a new word with its own weights and
+dimensions baked in. The `MODEL` word composes them. This is the same
+pattern as `nn.Sequential` in PyTorch — but it compiles to WASM, has zero
+framework overhead, and the "architecture definition" and the "executable
+model" are the same thing.
+
+### Automatic Differentiation via Dual Numbers
+
+Backpropagation is reverse-mode automatic differentiation. There is an
+elegant formulation using dual numbers (a value paired with its derivative)
+that maps to Forth's double-cell operations:
+
+```forth
+\ A dual number is a pair ( value derivative ) stored as a double cell
+\ WAFER's double-cell words (D+, D-, D*, 2DUP, etc.) operate on these natively
+
+\ Dual addition: (a, a') + (b, b') = (a+b, a'+b')
+: D+DUAL  ( a a' b b' -- a+b a'+b' )
+    ROT +          \ a' + b'
+    >R + R>        \ a + b, then restore derivative
+;
+
+\ Dual multiplication: (a, a') * (b, b') = (a*b, a*b' + a'*b)
+: D*DUAL  ( a a' b b' -- a*b a*b'+a'*b )
+    3 PICK *       \ a * b'
+    >R
+    ROT *          \ a' * b
+    R> +           \ a*b' + a'*b = derivative
+    >R
+    *              \ a * b = value
+    R>
+;
+```
+
+The chain rule emerges naturally: composing dual-number operations through a
+sequence of words automatically computes the derivative of the whole
+pipeline. This is the same principle behind JAX's `jvp` — but expressed as
+stack operations.
+
+### A Different Way to Look at It
+
+Most of the ML ecosystem's complexity lives in _training_. Inference, by
+comparison, is fairly straightforward: load weights, multiply matrices, apply
+activations, read output. That's a pipeline of arithmetic operations — which
+is pretty much what Forth was designed for. The industry tends to wrap
+inference in 500 MB frameworks because training needed those frameworks, and
+the two haven't been fully separated. A 50 KB Forth runtime doing quantized
+integer operations might be closer to what inference actually needs than we
+usually assume.
+
+---
+
+## 4. AI Generating Code: The Smallest Target Language
+
+### The Token Economy
+
+When an LLM generates code, every token costs money and adds latency. A
+Python solution to "compute the average of a list" looks like:
+
+```python
+def average(numbers):
+    if not numbers:
+        return 0
+    return sum(numbers) / len(numbers)
+```
+
+That is 25 tokens. The Forth equivalent:
+
+```forth
+: AVERAGE  ( addr n -- avg )  2DUP SUM -ROT NIP / ;
+```
+
+That is 12 tokens. For the same semantic content, Forth uses roughly half
+the tokens. At scale — millions of API calls, each generating hundreds of
+lines — this is a meaningful cost reduction. But the token savings are the
+least interesting advantage.
+
+### Minimal Syntax, Maximal Verifiability
+
+Forth has essentially no syntax. There are words separated by spaces. There
+are numbers. There are a few special constructs (`:` for definitions, `IF`
+/`THEN` for conditionals, `DO`/`LOOP` for iteration). That's about it.
+
+An LLM generating Python must get indentation right, match parentheses and
+brackets, handle keyword arguments, manage import statements, respect method
+resolution order, and navigate a standard library of thousands of functions.
+An LLM generating Forth mostly just needs to get the stack effect right.
+That's the main failure mode worth worrying about.
+
+And stack effects are **mechanically verifiable**:
+
+```forth
+\ Stack effect: ( n1 n2 -- n3 )
+\ Verification: start with 2 items on stack, end with 1
+: ADD-AND-DOUBLE  ( n1 n2 -- n3 )  + 2* ;
+
+\ Test:
+3 4 ADD-AND-DOUBLE   \ stack should contain: 14
+```
+
+You don't need a type checker or static analysis. Just run the word with
+known inputs and check the stack. If the stack depth and values match the
+declared effect, the word is correct. It's hard to think of another practical
+language where verification is this straightforward.
+
+### Self-Extending Vocabulary
+
+LLMs struggle with large codebases because context windows are finite. A
+Python project with 50 files and 10,000 lines requires the LLM to hold (or
+retrieve) vast amounts of context to generate correct code.
+
+Forth's defining characteristic is that you build the language up to your
+problem. The LLM doesn't need to generate a 100-line solution. It generates
+5-line words, each building on the previous ones:
+
+```forth
+\ Step 1: LLM generates basic operations
+: CLAMP  ( n lo hi -- n' )  ROT MIN MAX ;
+: BETWEEN?  ( n lo hi -- flag )  OVER - >R - R> U< ;
+
+\ Step 2: LLM generates higher-level operations using step 1
+: NORMALIZE  ( n -- n' )  0 255 CLAMP ;
+: IN-RANGE?  ( n -- flag )  0 100 BETWEEN? ;
+
+\ Step 3: LLM generates application logic using steps 1-2
+: PROCESS-SENSOR  ( raw -- calibrated )
+    offset @ -          \ remove sensor offset
+    NORMALIZE            \ clamp to valid range
+    scale @ *  1000 /    \ apply calibration scale
+;
+```
+
+Each step requires only the _names_ of previously defined words, not their
+implementations. The dictionary serves as a compressed representation of the
+entire program. An LLM can generate correct code by knowing only the word
+names and their stack effects — a few dozen tokens of context instead of
+thousands of lines.
+
+### WASM Sandbox: Safe Execution of Untrusted Code
+
+AI-generated code generally needs to be executed to be verified. Running
+arbitrary Python is tricky from a security perspective — file system access,
+network calls, `import os`, `eval()`. Sandboxing Python typically requires
+containerization, seccomp filters, or virtual machines.
+
+WAFER compiles to WASM, which executes in a sandbox by construction. A
+WAFER program:
+
+- Cannot access the file system
+- Cannot make network calls
+- Cannot read memory outside its linear memory
+- Cannot execute longer than the host allows (fuel metering)
+- Cannot consume more memory than the host allocates
+
+You can run AI-generated Forth with roughly the same confidence as a pure
+mathematical function. The sandbox isn't a bolt-on — it's just how WASM
+works.
+
+```forth
+\ AI generates this code. Is it safe to run? Yes, always.
+: FIBONACCI  ( n -- fib )
+    DUP 2 < IF EXIT THEN
+    DUP 1- RECURSE
+    SWAP 2 - RECURSE
+    +
+;
+```
+
+There's nothing this word can do except compute. No side effects, no
+escape hatches. The WASM sandbox guarantees that structurally.
+
+### A Different Way to Look at It
+
+The conventional wisdom is that LLMs need expressive, high-level languages
+to generate useful code. But there's a good case for the opposite: what LLMs
+really benefit from are **verifiable** languages — ones where correctness can
+be checked cheaply and deterministically. Expressiveness can actually work
+against you here: more syntax means more ways to be wrong, more edge cases
+to handle, more context to maintain. Forth's extreme minimalism starts to
+look less like a limitation and more like an advantage: generate a few small
+words, verify each one by running it, compose them into larger programs with
+confidence. The language that's hardest for humans to read might just be the
+easiest for machines to write correctly.
+
+---
+
+## 5. AI Agent Control: Plans That Execute Themselves
+
+### The Plan-Program Gap
+
+When an AI agent "plans," it produces a sequence of steps in natural
+language:
+
+> 1. Search for files matching "*.config"
+> 2. Read each file and extract the "timeout" field
+> 3. If timeout > 30, update it to 30
+> 4. Write the modified files back
+
+This plan is then "executed" by the agent interpreting each step, calling
+tools, handling errors, and managing state — all mediated by the LLM at
+every step, consuming tokens and latency for what is fundamentally a
+sequential program.
+
+The gap between "plan" and "program" might be more artificial than it looks.
+A plan _is_ a program — we just don't usually give agents a good executable
+representation for it.
+
+Forth could be that representation.
+
+### Tools as Words
+
+Every agent tool — file read, web search, code execution, API call — maps
+to a Forth word. The agent's toolkit becomes a Forth dictionary:
+
+```forth
+\ Agent tool vocabulary (host functions)
+\ SEARCH-FILES  ( pattern-addr pattern-len -- results-addr count )
+\ READ-FILE     ( path-addr path-len -- content-addr content-len )
+\ WRITE-FILE    ( content-addr content-len path-addr path-len -- )
+\ JSON-GET      ( json-addr key-addr key-len -- value-addr value-len )
+\ SHELL         ( cmd-addr cmd-len -- output-addr output-len )
+\ ASK-USER      ( question-addr question-len -- answer-addr answer-len )
+```
+
+Now the plan from above becomes an executable program:
+
+```forth
+: UPDATE-TIMEOUTS  ( -- )
+    S" *.config" SEARCH-FILES       \ get matching files
+    0 DO                             \ for each file
+        DUP I CELLS + @ COUNT        \ get filename
+        2DUP READ-FILE               \ read contents
+        S" timeout" JSON-GET         \ extract timeout field
+        S>NUMBER DROP                \ convert to number
+        30 > IF                      \ if timeout > 30
+            30 SET-TIMEOUT           \ update to 30
+            WRITE-FILE               \ write back
+        ELSE
+            2DROP                    \ discard unchanged
+        THEN
+    LOOP
+    DROP
+;
+
+UPDATE-TIMEOUTS
+```
+
+This isn't a description of what to do — it _is_ what to do. The agent
+generates it, WAFER compiles it to WASM, and it runs — no LLM in the loop
+during execution, no token cost per step, no latency per tool call.
+
+### Error Handling with CATCH/THROW
+
+Of course, agent plans fail. Files don't exist. APIs return errors.
+Permissions get denied. Production agent systems need robust error handling,
+which typically means calling the LLM at every step to decide what to do
+when something goes wrong.
+
+WAFER has `CATCH` and `THROW` — structured exception handling that lets
+the plan itself define error recovery:
+
+```forth
+: SAFE-READ  ( path-addr path-len -- content-addr content-len | 0 0 )
+    ['] READ-FILE CATCH IF
+        2DROP  0 0                   \ file not found: return empty
+    THEN
+;
+
+: SAFE-UPDATE  ( filename-addr filename-len -- )
+    2DUP SAFE-READ                   \ try to read
+    DUP 0= IF  2DROP 2DROP EXIT THEN \ skip if file missing
+    S" timeout" JSON-GET
+    S>NUMBER DROP
+    30 > IF
+        30 SET-TIMEOUT
+        WRITE-FILE
+    ELSE
+        2DROP 2DROP
+    THEN
+;
+
+: ROBUST-UPDATE-TIMEOUTS  ( -- )
+    S" *.config" SEARCH-FILES
+    0 DO
+        DUP I CELLS + @ COUNT SAFE-UPDATE
+    LOOP
+    DROP
+;
+```
+
+The error handling is part of the plan. The agent generates it once, and it
+runs to completion without further LLM intervention. Errors are handled at
+the speed of WASM, not the speed of an API call to an LLM.
+
+### The Dictionary as Growing Capability
+
+A human Forth programmer builds up vocabulary: small words compose into
+larger words, which compose into still larger words. The dictionary grows
+with the programmer's understanding of the problem.
+
+An AI agent does the same thing. Each successfully executed plan leaves
+behind defined words that can be reused:
+
+```forth
+\ First task: agent learns to read configs
+: READ-CONFIG  ( path-addr path-len -- json-addr json-len )
+    SAFE-READ DUP 0= IF EXIT THEN JSON-PARSE ;
+
+\ Second task: agent learns to update configs
+: UPDATE-CONFIG  ( key-addr key-len value path-addr path-len -- )
+    2DUP READ-CONFIG JSON-SET WRITE-FILE ;
+
+\ Third task: agent composes previous capabilities
+: MIGRATE-CONFIGS  ( -- )
+    S" *.config" SEARCH-FILES
+    0 DO
+        DUP I CELLS + @ COUNT
+        S" timeout" 30 ROT ROT UPDATE-CONFIG
+    LOOP DROP
+;
+
+\ The agent's vocabulary grows with experience.
+\ MIGRATE-CONFIGS didn't exist before. Now it does.
+\ Next time, the agent can use it as a building block.
+```
+
+You could call this _learned tool use_ — not in the machine learning sense,
+but in the software engineering sense. The agent defines new capabilities in
+terms of old ones, and the dictionary persists across invocations. Over time,
+the agent's vocabulary naturally converges on the abstractions that matter
+for its operational domain.
+
+### REPL as Test-Before-Commit
+
+Agents that act irreversibly on the first try are risky. WAFER's REPL model
+gives agents a natural test-before-commit workflow:
+
+1. **Define**: Generate and compile the plan as Forth words.
+2. **Test**: Run the words against sample data on the stack.
+3. **Verify**: Check the stack for expected results.
+4. **Execute**: Run the plan for real only after verification passes.
+
+```forth
+\ Step 1: Define
+: CALCULATE-DISCOUNT  ( price tier -- discounted )
+    CASE
+        1 OF  10 ENDOF   \ tier 1: 10% off
+        2 OF  20 ENDOF   \ tier 2: 20% off
+        3 OF  35 ENDOF   \ tier 3: 35% off
+        0 SWAP
+    ENDCASE
+    100 SWAP - * 100 /
+;
+
+\ Step 2: Test (no side effects, just stack operations)
+1000 1 CALCULATE-DISCOUNT .  \ expect 900
+1000 2 CALCULATE-DISCOUNT .  \ expect 800
+1000 3 CALCULATE-DISCOUNT .  \ expect 650
+
+\ Step 3: Verify output matches expectations
+\ Step 4: Apply to real data only after tests pass
+```
+
+The agent can generate, test, and iterate without ever touching production
+data. The REPL isn't just a debugging convenience here — it's a safety mechanism
+for autonomous agents.
+
+### Multi-Agent Coordination
+
+Multiple agents can share a WAFER dictionary through shared linear memory.
+One agent defines words. Another agent uses them. A coordinator agent
+composes them into higher-level plans:
+
+```forth
+\ Agent A defines data retrieval
+: FETCH-METRICS  ( -- addr n )  metrics-api QUERY PARSE-JSON ;
+
+\ Agent B defines analysis
+: DETECT-ANOMALIES  ( addr n -- anomalies-addr n )
+    THRESHOLD @  FILTER-ABOVE ;
+
+\ Agent C defines actions
+: ALERT  ( anomalies-addr n -- )
+    0 DO  DUP I CELLS + @  SEND-ALERT  LOOP DROP ;
+
+\ Coordinator composes them
+: MONITOR  ( -- )
+    BEGIN
+        FETCH-METRICS DETECT-ANOMALIES
+        DUP 0> IF  ALERT  ELSE  DROP  THEN
+        60000 DELAY
+    AGAIN
+;
+```
+
+Each agent contributes words to a shared vocabulary. The coordinator doesn't
+need to understand the implementation of `FETCH-METRICS` or
+`DETECT-ANOMALIES` — it only needs to know their stack effects. This is
+composability without coupling, coordination without shared state beyond
+the dictionary.
+
+### A Different Way to Look at It
+
+The AI agent community is building increasingly sophisticated "plan
+representations" — DAGs, state machines, behavior trees, ReAct loops — all
+trying to bridge the gap between the LLM's natural language output and
+actual tool execution. But Forth is already a plan representation that
+doubles as an execution engine. It has structured control flow (`IF`/`THEN`,
+`DO`/`LOOP`, `BEGIN`/`UNTIL`), error handling (`CATCH`/`THROW`),
+composability (word definitions), and a test harness (the REPL and stack).
+Maybe the gap between "plan" and "program" doesn't need to be bridged so
+much as it needs to be _erased_.
+
+---
+
+## Convergence: Five Problems, One Shape
+
+These five domains look different on the surface:
+
+| Domain          | Traditional Tool               | Core Operation       |
+| --------------- | ------------------------------ | -------------------- |
+| Data analytics  | Pandas, Spark                  | Transform pipeline   |
+| Database engine | SQLite VDBE, Postgres executor | Query plan execution |
+| AI inference    | PyTorch, TensorFlow            | Layer composition    |
+| AI codegen      | Python, JavaScript             | Program synthesis    |
+| AI agents       | LangChain, CrewAI              | Plan execution       |
+
+But they share a deep structure: **sequential composition of simple
+operations on a data flow**. A data pipeline, a query plan, a forward
+pass, a synthesized program, and an agent plan are all the same thing:
+a sequence of words applied to a stack.
+
+Forth noticed this in 1970. Charles Moore designed a language around the
+observation that most computation is a pipeline of transformations, and
+the simplest way to express pipelines is sequential composition on a
+stack. The language has no syntax because pipelines don't need syntax.
+It has no type system because the data flow _is_ the type. It has no
+package manager because each program builds its own vocabulary from
+primitives.
+
+WAFER brings these ideas to the modern world by targeting WebAssembly — the
+universal runtime that runs in browsers, on servers, on edge devices, in
+sandboxes. That combination opens up some interesting possibilities:
+
+- **Analytics in the browser** with no server, no framework, deterministic
+  execution.
+- **Database VMs** that compile queries to native WASM through an existing
+  Forth JIT.
+- **Inference engines** that fit in 50 KB and run on any device WASM
+  reaches.
+- **AI-generated code** in the language with the smallest syntax, cheapest
+  verification, and safest sandbox.
+- **Agent plans** that are executable programs, testable in a REPL,
+  composable through a growing dictionary.
+
+None of this requires Forth to change. Forth has been this shape for 55
+years. It's kind of fun that the world's problems seem to be circling back
+to it.
+
+---
+
+_WAFER is open source. Start at the [repository root](../README.md)._
+_Architecture details: [WAFER.md](WAFER.md). Language introduction:
+[FORTH.md](FORTH.md)._