# The Unreasonable Effectiveness of Stack Machines

_How Forth — and WAFER — can serve as infrastructure for data analytics,
databases, AI inference, AI code generation, and AI agent control._

---

Forth is 55 years old. It has no type system, no garbage collector, no package
manager, no syntax to speak of. By most conventional measures, it shouldn't
still be relevant.

But it keeps showing up at the edges — in firmware, in space probes, in
real-time systems, in places where correctness and determinism matter more than
developer ergonomics. That's worth paying attention to.

The properties that make Forth unusual — concatenative composition, zero-cost
abstraction through word definition, a stack-based execution model that maps
directly to hardware — happen to line up surprisingly well with what five of
the most active areas in modern computing are independently reaching for:

1. **Data analytics** wants composable, streaming pipelines.
2. **Database engines** want stack-based virtual machines for query execution.
3. **AI inference** wants tiny, deterministic, embeddable runtimes.
4. **AI code generation** wants the smallest possible target language.
5. **AI agent systems** want plans that are also executable programs.

Forth won't single-handedly solve any of these. But it offers a useful lens
for understanding what each of them actually needs — and WAFER, a Forth that
compiles to WebAssembly, is in a good position to explore that space.

WAFER (WebAssembly Forth Engine in Rust) JIT-compiles each Forth word to its
own WASM module, linked through shared linear memory, globals, and a function
table. It runs anywhere WASM runs: browsers, edge devices, servers, embedded
systems. It has 160+ words, 100% Forth 2012 compliance on 10 word sets, and
fits in ~50 KB. It has exception handling (`CATCH`/`THROW`), metaprogramming
(`DOES>`), dynamic compilation (`EVALUATE`), and an optimization pipeline
designed for stack-to-local promotion that can achieve 7x speedups.

This document explores what becomes possible when you take these properties
seriously.

---

## 1. Data Analytics: Pipelines Without Plumbing

### The Problem with Pipelines

Every data analytics framework reinvents the same idea: take data, push it
through a sequence of transformations, collect the result. Pandas chains
methods. Spark builds DAGs. dplyr pipes with `%>%`. Unix pipes bytes through
`|`. They all converge on the same shape: **linear composition of operations
on an implicit data flow**.

This is exactly what Forth does. It has done it since 1970. The data stack
_is_ the pipeline. Each word _is_ a transformation. Composition is
juxtaposition — you don't pipe, you don't chain, you don't bind. You just
write the words next to each other.

```forth
\ Pandas: df['amount'].where(df['amount'] > 0).mean()
\ Forth:
: POSITIVE?  ( n -- n flag )  DUP 0> ;
: FILTER-POSITIVE  ( addr n -- addr' n' )
    0 >R  0 >R   \ count and sum accumulators on return stack
    0 DO
        DUP I CELLS + @
        POSITIVE? IF  R> + >R  R> 1+ >R  THEN
    LOOP DROP
    R> R>  \ ( sum count )
;
: MEAN  ( sum count -- avg )  / ;

data 100 FILTER-POSITIVE MEAN .
```

This goes a bit deeper than syntactic sugar. The absence of intermediate
variables is a structural property. In a Pandas chain, every `.method()`
returns a new DataFrame object that must be allocated, tracked, and eventually
collected. In Forth, the data flows through the stack with zero allocation.
The pipeline _is_ the execution.

### Streaming and Incremental Computation

The stack model is inherently streaming. A word consumes its inputs and
produces its outputs in the same motion. There is no "collect all data first,
then process" step unless you explicitly build one. This makes Forth natural
for:

- **Event stream processing**: each event lands on the stack, a word
  processes it, the result is consumed by the next word.
- **Incremental aggregation**: running sums, counts, and statistics
  maintained on the return stack across invocations.
- **Windowed computation**: a circular buffer in linear memory with
  stack-based access patterns.

```forth
\ Running average over a stream of values
VARIABLE running-sum
VARIABLE running-count

: UPDATE-AVG  ( new-value -- running-avg )
    running-sum @ +  DUP running-sum !
    running-count @ 1+  DUP running-count !
    /
;

\ Each incoming value:
42 UPDATE-AVG .    \ prints running average after adding 42
17 UPDATE-AVG .    \ prints updated average after adding 17
```

### Client-Side Analytics via WASM

WAFER compiles to WebAssembly. This means analytics can run _in the browser_
with no server round-trips. A user uploads a CSV, WAFER parses and processes
it entirely client-side, and the results render immediately. No data leaves
the machine. No API calls. No latency.

This isn't just a nice demo. For privacy-sensitive analytics (healthcare,
finance, GDPR-regulated data), client-side processing can be a compliance
requirement, not just a nice-to-have. WAFER's deterministic execution (no GC
pauses, no background threads, fixed memory layout) makes it predictable
enough for real-time dashboards.

### Domain-Specific Languages

Forth's defining feature is that you build the language up to your problem.
An analytics team doesn't write Forth — they write _their DSL_, which
happens to be implemented in Forth:

```forth
\ Define a mini analytics vocabulary
: COLUMN  ( col# -- addr n )  table-base SWAP col-offset + col-length ;
: SUM     ( addr n -- total )  0 ROT ROT 0 DO  OVER I CELLS + @ +  LOOP NIP ;
: COUNT   ( addr n -- n )      NIP ;
: AVG     ( addr n -- avg )    2DUP SUM -ROT COUNT / ;
: WHERE>  ( addr n thresh -- addr' n' )  filter-gt ;

\ The analyst writes:
3 COLUMN  1000 WHERE>  AVG .
\ "Average of column 3 where values exceed 1000"
```

The DSL compiles to WASM through WAFER's IR pipeline. There is no
interpreter overhead at query time. The analyst's vocabulary _is_ the
optimized code.

### A Different Way to Look at It

Most languages treat the absence of named variables as a limitation. But in
data pipelines, it can actually be a **feature**. Named intermediates create
coupling points — places where code can refer to stale state, where
refactoring requires renaming, where parallelization requires dependency
analysis. Point-free composition through a stack sidesteps this whole class
of problems. The data is always _here_, on top of the stack, ready for the
next transformation.

---

## 2. Database Engine: The Query VM You Already Have

### Databases Already Think in Stacks

SQLite — the most deployed database engine in the world — executes queries
through the VDBE (Virtual Database Engine), a stack-based bytecode virtual
machine. When you write `SELECT * FROM users WHERE age > 30`, SQLite's query
planner compiles it into a sequence of stack operations: open cursor, seek,
compare, jump, emit row.

PostgreSQL's executor runs a tree of plan nodes, each of which pushes tuples
upward. MySQL's handler interface is a stack of operations. CockroachDB
compiles SQL to a vectorized execution engine that operates on batches — but
the control flow is still a stack of operators.

There's a pattern here: **query execution engines tend to converge on
stack machines**. Forth just happens to already be one, with no extra
abstraction layers in between.

### Query Plans as Forth Programs

A SQL query plan is a tree. Flattened into execution order, it becomes a
sequence of operations — which is exactly a Forth program:

```sql
SELECT name, salary FROM employees WHERE dept = 'ENG' AND salary > 100000;
```

The query plan, expressed as Forth:

```forth
\ Primitives provided by the storage engine
\ SCAN        ( table -- cursor )
\ NEXT-ROW    ( cursor -- cursor flag )  flag=true if row available
\ COL@        ( cursor col# -- value )
\ EMIT-ROW    ( v1 v2 -- )              send to result set
\ CLOSE       ( cursor -- )

: MATCH-DEPT?   ( cursor -- cursor flag )  DUP 2 COL@ S" ENG" COMPARE 0= ;
: MATCH-SAL?    ( cursor -- cursor flag )  DUP 3 COL@ 100000 > ;
: PROJECT       ( cursor -- )             DUP 0 COL@  OVER 3 COL@  EMIT-ROW ;

: QUERY  ( -- )
    employees SCAN
    BEGIN
        NEXT-ROW
    WHILE
        MATCH-DEPT? IF
            MATCH-SAL? IF
                PROJECT
            THEN
        THEN
    REPEAT
    CLOSE
;
```

This isn't just pseudocode, either. Every word here could be a real WAFER
word backed by storage primitives implemented as host functions. The query
compiles through WAFER's IR pipeline to native WASM, with the same
optimization opportunities as any other Forth word: inlining, constant
folding, dead code elimination.

### EVALUATE as Dynamic Query Compilation

SQL databases accept queries as strings and compile them at runtime. Forth
has `EVALUATE`, which does exactly the same thing — takes a string and
compiles/executes it:

```forth
\ Build a query string dynamically
S" employees SCAN BEGIN NEXT-ROW WHILE MATCH-DEPT? IF PROJECT THEN REPEAT CLOSE"
EVALUATE
```

The difference from SQL: the "query language" and the "implementation
language" are the same. There is no impedance mismatch between the language
the user writes queries in and the language the engine executes them in. A
user-defined function is just another word. An index lookup is just another
word. A join strategy is just another word. They all compose the same way.

### Linear Memory as Storage Pages

WAFER's linear memory model maps directly to how databases manage storage.
A database page is a fixed-size block of bytes at a known offset — exactly
what Forth's `@` and `!` operate on. B-tree nodes are structures in linear
memory traversed by pointer arithmetic:

```forth
\ B-tree node layout:
\   +0: key count (cell)
\   +4: is-leaf flag (cell)
\   +8: keys array (key-count cells)
\   +8+4*key-count: child pointers (key-count+1 cells)

: NODE-KEYS    ( node -- addr )  8 + ;
: NODE-KEY@    ( node i -- key )  CELLS SWAP NODE-KEYS + @ ;
: NODE-CHILD@  ( node i -- child )
    OVER NODE-KEYS
    OVER @ CELLS +    \ skip past keys array
    SWAP CELLS + 4 +  \ index into children
    @
;

: BTREE-SEARCH  ( node target-key -- addr|0 )
    OVER @ 0= IF  2DROP 0  EXIT  THEN  \ empty node
    OVER 4 + @ IF                        \ leaf node
        LEAF-SEARCH
    ELSE
        INTERNAL-SEARCH                  \ recurse into child
    THEN
;
```

### WASM Sandboxing for User-Defined Functions

Safely executing user-defined functions (UDFs) is one of the trickier
problems in database engines. PostgreSQL UDFs in C can crash the server.
JavaScript UDFs require embedding V8. Python UDFs tend to be slow.

WAFER UDFs compile to WASM and execute in a sandbox with bounded memory,
bounded execution time, and no access to anything outside the linear memory
they're given. A malicious UDF can't read other users' data, can't make
network calls, can't crash the host. WAFER gets this for free — it's
inherent to WASM's security model.

```forth
\ User defines a custom scoring function
: SCORE  ( age salary -- score )
    1000 /         \ salary contribution (salary/1000)
    SWAP 50 - ABS  \ age penalty (distance from 50)
    -              \ final score
;

\ Engine uses it in a query
: RANKED-QUERY  ( -- )
    employees SCAN
    BEGIN NEXT-ROW WHILE
        DUP 1 COL@  OVER 3 COL@  SCORE
        50 > IF  PROJECT  THEN
    REPEAT CLOSE
;
```

The `SCORE` function compiles to a WASM module through WAFER's JIT. It runs
at near-native speed, sandboxed, with no FFI overhead.

### A Different Way to Look at It

Database engineers put a lot of effort into building query VMs — designing
bytecode formats, writing interpreters, adding JIT compilation. In a sense,
they're often reinventing something Forth-shaped each time. It's worth asking:
what if you just started with Forth and built the storage layer underneath it?

---

## 3. AI Inference: Neural Networks as Word Composition

### Layers Are Words, Forward Pass Is Composition

A neural network's forward pass is a pipeline: input tensor enters, passes
through a sequence of layers (linear transform, activation, normalization),
and a prediction exits. Each layer takes a tensor and produces a tensor.

In Forth terms: each layer is a word. The tensor sits on the stack. The
forward pass is the composition of those words:

```forth
\ Assuming tensor operations as primitives (host functions):
\ T-MATMUL  ( tensor weights -- tensor )
\ T-ADD     ( tensor bias -- tensor )
\ T-RELU    ( tensor -- tensor )
\ T-SOFTMAX ( tensor -- tensor )

: LINEAR1    ( tensor -- tensor )  w1 T-MATMUL b1 T-ADD ;
: LINEAR2    ( tensor -- tensor )  w2 T-MATMUL b2 T-ADD ;
: LINEAR3    ( tensor -- tensor )  w3 T-MATMUL b3 T-ADD ;

: CLASSIFIER ( tensor -- tensor )
    LINEAR1 T-RELU
    LINEAR2 T-RELU
    LINEAR3 T-SOFTMAX
;

input-data CLASSIFIER  \ forward pass
```

This maps more directly than you might expect. The compositional structure of
neural networks lines up nicely with the compositional structure of Forth
programs. The stack carries the data flow. The words are the layers. The
dictionary holds the model architecture.

### Quantized Inference on the Integer Stack

Most production inference runs quantized — INT8 or INT4 weights, integer
arithmetic, no floating point. Forth's native data type is the integer cell.
WAFER's `i32` stack operations map directly to quantized tensor operations:

```forth
\ INT8 quantized dot product of two vectors
: QDOT  ( addr1 addr2 n -- result )
    0 >R                         \ accumulator on return stack
    0 DO
        OVER I + C@  127 -       \ load and de-bias first element
        OVER I + C@  127 -       \ load and de-bias second element
        *  R> + >R               \ multiply-accumulate
    LOOP
    2DROP R>
;

\ Quantized linear layer
: QLINEAR  ( input-addr weight-addr rows cols -- output-addr )
    \ For each output neuron, compute QDOT with input
    output-buf >R
    0 DO
        2DUP I row-offset + SWAP QDOT
        R@ I CELLS + !
    LOOP
    2DROP R>
;
```

No framework dependency, no Python interpreter, no CUDA runtime — just
integer arithmetic on a stack, compiled to WASM, running on any device.

### Edge AI: The 50 KB Runtime

ML inference frameworks tend to be big. PyTorch is ~500 MB. TensorFlow Lite
is ~1 MB for the runtime alone. ONNX Runtime is ~10 MB.

WAFER is ~50 KB for the full Forth system. The model weights dominate the
binary size, not the runtime. For edge devices — IoT sensors, wearables,
microcontrollers, browser tabs — that size difference can be the difference
between "fits" and "doesn't fit."

WASM's portability means the same inference code runs on an ARM
microcontroller, in a browser, on a server, without recompilation. Write the
model once in Forth, deploy everywhere WASM reaches.

### DOES> for Architecture Generation

Forth's `DOES>` is a metaprogramming facility: it creates words that create
other words, each with custom runtime behavior. This is exactly what neural
architecture construction needs:

```forth
\ LAYER is a defining word that creates layer words
: LAYER  ( weights bias rows cols -- )
    CREATE  , , , ,           \ store dimensions and pointers
    DOES>   ( tensor -- tensor )
        DUP >R                \ save parameter field address
        R@ @  R@ 4 + @       \ get cols, rows
        R@ 8 + @              \ get weights address
        T-MATMUL
        R> 12 + @             \ get bias address
        T-ADD
;

\ Define the network architecture
w1 b1 768 512  LAYER EMBED
w2 b2 512 256  LAYER HIDDEN1
w3 b3 256 10   LAYER OUTPUT

\ The architecture is now executable
: MODEL  ( tensor -- tensor )  EMBED T-RELU HIDDEN1 T-RELU OUTPUT T-SOFTMAX ;
```

Each `LAYER` invocation creates a new word with its own weights and
dimensions baked in. The `MODEL` word composes them. This is the same
pattern as `nn.Sequential` in PyTorch — but it compiles to WASM, has zero
framework overhead, and the "architecture definition" and the "executable
model" are the same thing.

### Automatic Differentiation via Dual Numbers

Backpropagation is reverse-mode automatic differentiation. There is an
elegant formulation using dual numbers (a value paired with its derivative)
that maps to Forth's double-cell operations:

```forth
\ A dual number is a pair ( value derivative ) stored as a double cell
\ WAFER's double-cell words (D+, D-, D*, 2DUP, etc.) operate on these natively

\ Dual addition: (a, a') + (b, b') = (a+b, a'+b')
: D+DUAL  ( a a' b b' -- a+b a'+b' )
    ROT +          \ a' + b'
    >R + R>        \ a + b, then restore derivative
;

\ Dual multiplication: (a, a') * (b, b') = (a*b, a*b' + a'*b)
: D*DUAL  ( a a' b b' -- a*b a*b'+a'*b )
    3 PICK *       \ a * b'
    >R
    ROT *          \ a' * b
    R> +           \ a*b' + a'*b = derivative
    >R
    *              \ a * b = value
    R>
;
```

The chain rule emerges naturally: composing dual-number operations through a
sequence of words automatically computes the derivative of the whole
pipeline. This is the same principle behind JAX's `jvp` — but expressed as
stack operations.

### A Different Way to Look at It

Most of the ML ecosystem's complexity lives in _training_. Inference, by
comparison, is fairly straightforward: load weights, multiply matrices, apply
activations, read output. That's a pipeline of arithmetic operations — which
is pretty much what Forth was designed for. The industry tends to wrap
inference in 500 MB frameworks because training needed those frameworks, and
the two haven't been fully separated. A 50 KB Forth runtime doing quantized
integer operations might be closer to what inference actually needs than we
usually assume.

---

## 4. AI Generating Code: The Smallest Target Language

### The Token Economy

When an LLM generates code, every token costs money and adds latency. A
Python solution to "compute the average of a list" looks like:

```python
def average(numbers):
    if not numbers:
        return 0
    return sum(numbers) / len(numbers)
```

That is 25 tokens. The Forth equivalent:

```forth
: AVERAGE  ( addr n -- avg )  2DUP SUM -ROT NIP / ;
```

That is 12 tokens. For the same semantic content, Forth uses roughly half
the tokens. At scale — millions of API calls, each generating hundreds of
lines — this is a meaningful cost reduction. But the token savings are the
least interesting advantage.

### Minimal Syntax, Maximal Verifiability

Forth has essentially no syntax. There are words separated by spaces. There
are numbers. There are a few special constructs (`:` for definitions, `IF`
/`THEN` for conditionals, `DO`/`LOOP` for iteration). That's about it.

An LLM generating Python must get indentation right, match parentheses and
brackets, handle keyword arguments, manage import statements, respect method
resolution order, and navigate a standard library of thousands of functions.
An LLM generating Forth mostly just needs to get the stack effect right.
That's the main failure mode worth worrying about.

And stack effects are **mechanically verifiable**:

```forth
\ Stack effect: ( n1 n2 -- n3 )
\ Verification: start with 2 items on stack, end with 1
: ADD-AND-DOUBLE  ( n1 n2 -- n3 )  + 2* ;

\ Test:
3 4 ADD-AND-DOUBLE   \ stack should contain: 14
```

You don't need a type checker or static analysis. Just run the word with
known inputs and check the stack. If the stack depth and values match the
declared effect, the word is correct. It's hard to think of another practical
language where verification is this straightforward.

### Self-Extending Vocabulary

LLMs struggle with large codebases because context windows are finite. A
Python project with 50 files and 10,000 lines requires the LLM to hold (or
retrieve) vast amounts of context to generate correct code.

Forth's defining characteristic is that you build the language up to your
problem. The LLM doesn't need to generate a 100-line solution. It generates
5-line words, each building on the previous ones:

```forth
\ Step 1: LLM generates basic operations
: CLAMP  ( n lo hi -- n' )  ROT MIN MAX ;
: BETWEEN?  ( n lo hi -- flag )  OVER - >R - R> U< ;

\ Step 2: LLM generates higher-level operations using step 1
: NORMALIZE  ( n -- n' )  0 255 CLAMP ;
: IN-RANGE?  ( n -- flag )  0 100 BETWEEN? ;

\ Step 3: LLM generates application logic using steps 1-2
: PROCESS-SENSOR  ( raw -- calibrated )
    offset @ -          \ remove sensor offset
    NORMALIZE            \ clamp to valid range
    scale @ *  1000 /    \ apply calibration scale
;
```

Each step requires only the _names_ of previously defined words, not their
implementations. The dictionary serves as a compressed representation of the
entire program. An LLM can generate correct code by knowing only the word
names and their stack effects — a few dozen tokens of context instead of
thousands of lines.

### WASM Sandbox: Safe Execution of Untrusted Code

AI-generated code generally needs to be executed to be verified. Running
arbitrary Python is tricky from a security perspective — file system access,
network calls, `import os`, `eval()`. Sandboxing Python typically requires
containerization, seccomp filters, or virtual machines.

WAFER compiles to WASM, which executes in a sandbox by construction. A
WAFER program:

- Cannot access the file system
- Cannot make network calls
- Cannot read memory outside its linear memory
- Cannot execute longer than the host allows (fuel metering)
- Cannot consume more memory than the host allocates

You can run AI-generated Forth with roughly the same confidence as a pure
mathematical function. The sandbox isn't a bolt-on — it's just how WASM
works.

```forth
\ AI generates this code. Is it safe to run? Yes, always.
: FIBONACCI  ( n -- fib )
    DUP 2 < IF EXIT THEN
    DUP 1- RECURSE
    SWAP 2 - RECURSE
    +
;
```

There's nothing this word can do except compute. No side effects, no
escape hatches. The WASM sandbox guarantees that structurally.

### A Different Way to Look at It

The conventional wisdom is that LLMs need expressive, high-level languages
to generate useful code. But there's a good case for the opposite: what LLMs
really benefit from are **verifiable** languages — ones where correctness can
be checked cheaply and deterministically. Expressiveness can actually work
against you here: more syntax means more ways to be wrong, more edge cases
to handle, more context to maintain. Forth's extreme minimalism starts to
look less like a limitation and more like an advantage: generate a few small
words, verify each one by running it, compose them into larger programs with
confidence. The language that's hardest for humans to read might just be the
easiest for machines to write correctly.

---

## 5. AI Agent Control: Plans That Execute Themselves

### The Plan-Program Gap

When an AI agent "plans," it produces a sequence of steps in natural
language:

> 1. Search for files matching "*.config"
> 2. Read each file and extract the "timeout" field
> 3. If timeout > 30, update it to 30
> 4. Write the modified files back

This plan is then "executed" by the agent interpreting each step, calling
tools, handling errors, and managing state — all mediated by the LLM at
every step, consuming tokens and latency for what is fundamentally a
sequential program.

The gap between "plan" and "program" might be more artificial than it looks.
A plan _is_ a program — we just don't usually give agents a good executable
representation for it.

Forth could be that representation.

### Tools as Words

Every agent tool — file read, web search, code execution, API call — maps
to a Forth word. The agent's toolkit becomes a Forth dictionary:

```forth
\ Agent tool vocabulary (host functions)
\ SEARCH-FILES  ( pattern-addr pattern-len -- results-addr count )
\ READ-FILE     ( path-addr path-len -- content-addr content-len )
\ WRITE-FILE    ( content-addr content-len path-addr path-len -- )
\ JSON-GET      ( json-addr key-addr key-len -- value-addr value-len )
\ SHELL         ( cmd-addr cmd-len -- output-addr output-len )
\ ASK-USER      ( question-addr question-len -- answer-addr answer-len )
```

Now the plan from above becomes an executable program:

```forth
: UPDATE-TIMEOUTS  ( -- )
    S" *.config" SEARCH-FILES       \ get matching files
    0 DO                             \ for each file
        DUP I CELLS + @ COUNT        \ get filename
        2DUP READ-FILE               \ read contents
        S" timeout" JSON-GET         \ extract timeout field
        S>NUMBER DROP                \ convert to number
        30 > IF                      \ if timeout > 30
            30 SET-TIMEOUT           \ update to 30
            WRITE-FILE               \ write back
        ELSE
            2DROP                    \ discard unchanged
        THEN
    LOOP
    DROP
;

UPDATE-TIMEOUTS
```

This isn't a description of what to do — it _is_ what to do. The agent
generates it, WAFER compiles it to WASM, and it runs — no LLM in the loop
during execution, no token cost per step, no latency per tool call.

### Error Handling with CATCH/THROW

Of course, agent plans fail. Files don't exist. APIs return errors.
Permissions get denied. Production agent systems need robust error handling,
which typically means calling the LLM at every step to decide what to do
when something goes wrong.

WAFER has `CATCH` and `THROW` — structured exception handling that lets
the plan itself define error recovery:

```forth
: SAFE-READ  ( path-addr path-len -- content-addr content-len | 0 0 )
    ['] READ-FILE CATCH IF
        2DROP  0 0                   \ file not found: return empty
    THEN
;

: SAFE-UPDATE  ( filename-addr filename-len -- )
    2DUP SAFE-READ                   \ try to read
    DUP 0= IF  2DROP 2DROP EXIT THEN \ skip if file missing
    S" timeout" JSON-GET
    S>NUMBER DROP
    30 > IF
        30 SET-TIMEOUT
        WRITE-FILE
    ELSE
        2DROP 2DROP
    THEN
;

: ROBUST-UPDATE-TIMEOUTS  ( -- )
    S" *.config" SEARCH-FILES
    0 DO
        DUP I CELLS + @ COUNT SAFE-UPDATE
    LOOP
    DROP
;
```

The error handling is part of the plan. The agent generates it once, and it
runs to completion without further LLM intervention. Errors are handled at
the speed of WASM, not the speed of an API call to an LLM.

### The Dictionary as Growing Capability

A human Forth programmer builds up vocabulary: small words compose into
larger words, which compose into still larger words. The dictionary grows
with the programmer's understanding of the problem.

An AI agent does the same thing. Each successfully executed plan leaves
behind defined words that can be reused:

```forth
\ First task: agent learns to read configs
: READ-CONFIG  ( path-addr path-len -- json-addr json-len )
    SAFE-READ DUP 0= IF EXIT THEN JSON-PARSE ;

\ Second task: agent learns to update configs
: UPDATE-CONFIG  ( key-addr key-len value path-addr path-len -- )
    2DUP READ-CONFIG JSON-SET WRITE-FILE ;

\ Third task: agent composes previous capabilities
: MIGRATE-CONFIGS  ( -- )
    S" *.config" SEARCH-FILES
    0 DO
        DUP I CELLS + @ COUNT
        S" timeout" 30 ROT ROT UPDATE-CONFIG
    LOOP DROP
;

\ The agent's vocabulary grows with experience.
\ MIGRATE-CONFIGS didn't exist before. Now it does.
\ Next time, the agent can use it as a building block.
```

You could call this _learned tool use_ — not in the machine learning sense,
but in the software engineering sense. The agent defines new capabilities in
terms of old ones, and the dictionary persists across invocations. Over time,
the agent's vocabulary naturally converges on the abstractions that matter
for its operational domain.

### REPL as Test-Before-Commit

Agents that act irreversibly on the first try are risky. WAFER's REPL model
gives agents a natural test-before-commit workflow:

1. **Define**: Generate and compile the plan as Forth words.
2. **Test**: Run the words against sample data on the stack.
3. **Verify**: Check the stack for expected results.
4. **Execute**: Run the plan for real only after verification passes.

```forth
\ Step 1: Define
: CALCULATE-DISCOUNT  ( price tier -- discounted )
    CASE
        1 OF  10 ENDOF   \ tier 1: 10% off
        2 OF  20 ENDOF   \ tier 2: 20% off
        3 OF  35 ENDOF   \ tier 3: 35% off
        0 SWAP
    ENDCASE
    100 SWAP - * 100 /
;

\ Step 2: Test (no side effects, just stack operations)
1000 1 CALCULATE-DISCOUNT .  \ expect 900
1000 2 CALCULATE-DISCOUNT .  \ expect 800
1000 3 CALCULATE-DISCOUNT .  \ expect 650

\ Step 3: Verify output matches expectations
\ Step 4: Apply to real data only after tests pass
```

The agent can generate, test, and iterate without ever touching production
data. The REPL isn't just a debugging convenience here — it's a safety mechanism
for autonomous agents.

### Multi-Agent Coordination

Multiple agents can share a WAFER dictionary through shared linear memory.
One agent defines words. Another agent uses them. A coordinator agent
composes them into higher-level plans:

```forth
\ Agent A defines data retrieval
: FETCH-METRICS  ( -- addr n )  metrics-api QUERY PARSE-JSON ;

\ Agent B defines analysis
: DETECT-ANOMALIES  ( addr n -- anomalies-addr n )
    THRESHOLD @  FILTER-ABOVE ;

\ Agent C defines actions
: ALERT  ( anomalies-addr n -- )
    0 DO  DUP I CELLS + @  SEND-ALERT  LOOP DROP ;

\ Coordinator composes them
: MONITOR  ( -- )
    BEGIN
        FETCH-METRICS DETECT-ANOMALIES
        DUP 0> IF  ALERT  ELSE  DROP  THEN
        60000 DELAY
    AGAIN
;
```

Each agent contributes words to a shared vocabulary. The coordinator doesn't
need to understand the implementation of `FETCH-METRICS` or
`DETECT-ANOMALIES` — it only needs to know their stack effects. This is
composability without coupling, coordination without shared state beyond
the dictionary.

### A Different Way to Look at It

The AI agent community is building increasingly sophisticated "plan
representations" — DAGs, state machines, behavior trees, ReAct loops — all
trying to bridge the gap between the LLM's natural language output and
actual tool execution. But Forth is already a plan representation that
doubles as an execution engine. It has structured control flow (`IF`/`THEN`,
`DO`/`LOOP`, `BEGIN`/`UNTIL`), error handling (`CATCH`/`THROW`),
composability (word definitions), and a test harness (the REPL and stack).
Maybe the gap between "plan" and "program" doesn't need to be bridged so
much as it needs to be _erased_.

---

## Convergence: Five Problems, One Shape

These five domains look different on the surface:

| Domain          | Traditional Tool               | Core Operation       |
| --------------- | ------------------------------ | -------------------- |
| Data analytics  | Pandas, Spark                  | Transform pipeline   |
| Database engine | SQLite VDBE, Postgres executor | Query plan execution |
| AI inference    | PyTorch, TensorFlow            | Layer composition    |
| AI codegen      | Python, JavaScript             | Program synthesis    |
| AI agents       | LangChain, CrewAI              | Plan execution       |

But they share a deep structure: **sequential composition of simple
operations on a data flow**. A data pipeline, a query plan, a forward
pass, a synthesized program, and an agent plan are all the same thing:
a sequence of words applied to a stack.

Forth noticed this in 1970. Charles Moore designed a language around the
observation that most computation is a pipeline of transformations, and
the simplest way to express pipelines is sequential composition on a
stack. The language has no syntax because pipelines don't need syntax.
It has no type system because the data flow _is_ the type. It has no
package manager because each program builds its own vocabulary from
primitives.

WAFER brings these ideas to the modern world by targeting WebAssembly — the
universal runtime that runs in browsers, on servers, on edge devices, in
sandboxes. That combination opens up some interesting possibilities:

- **Analytics in the browser** with no server, no framework, deterministic
  execution.
- **Database VMs** that compile queries to native WASM through an existing
  Forth JIT.
- **Inference engines** that fit in 50 KB and run on any device WASM
  reaches.
- **AI-generated code** in the language with the smallest syntax, cheapest
  verification, and safest sandbox.
- **Agent plans** that are executable programs, testable in a REPL,
  composable through a growing dictionary.

None of this requires Forth to change. Forth has been this shape for 55
years. It's kind of fun that the world's problems seem to be circling back
to it.

---

_WAFER is open source. Start at the [repository root](../README.md)._
_Architecture details: [WAFER.md](WAFER.md). Language introduction:
[FORTH.md](FORTH.md)._