Files
WAFER/docs/APPLICATIONS.md
T
ok eb79c40c69 Implement complete Floating-Point word set, 70+ float words
Separate float stack with fsp global, IEEE 754 double precision.
Stack ops: FDROP FDUP FSWAP FOVER FROT FDEPTH
Arithmetic: F+ F- F* F/ FNEGATE FABS FMAX FMIN FSQRT FLOOR FROUND F**
Comparisons: F0= F0< F= F< F~
Memory: F@ F! SF@ SF! DF@ DF! FLOAT+ FLOATS FALIGNED FALIGN
Conversions: D>F F>D S>F F>S
Trig: FSIN FCOS FTAN FASIN FACOS FATAN FATAN2 FSINCOS
Exp/Log: FEXP FEXPM1 FLN FLNP1 FLOG FALOG
Hyperbolic: FSINH FCOSH FTANH FASINH FACOSH FATANH
I/O: F. FE. FS. REPRESENT >FLOAT PRECISION SET-PRECISION
Defining: FVARIABLE FCONSTANT FVALUE FLITERAL
Float literal parsing (1E, 1.5E2, -3.14E0 format)
299 unit tests + 11 compliance tests, 0 errors on float test suite
2026-04-01 20:38:48 +02:00

32 KiB

The Unreasonable Effectiveness of Stack Machines

How Forth — and WAFER — can serve as infrastructure for data analytics, databases, AI inference, AI code generation, and AI agent control.


Forth is 55 years old. It has no type system, no garbage collector, no package manager, no syntax to speak of. By most conventional measures, it shouldn't still be relevant.

But it keeps showing up at the edges — in firmware, in space probes, in real-time systems, in places where correctness and determinism matter more than developer ergonomics. That's worth paying attention to.

The properties that make Forth unusual — concatenative composition, zero-cost abstraction through word definition, a stack-based execution model that maps directly to hardware — happen to line up surprisingly well with what five of the most active areas in modern computing are independently reaching for:

  1. Data analytics wants composable, streaming pipelines.
  2. Database engines want stack-based virtual machines for query execution.
  3. AI inference wants tiny, deterministic, embeddable runtimes.
  4. AI code generation wants the smallest possible target language.
  5. AI agent systems want plans that are also executable programs.

Forth won't single-handedly solve any of these. But it offers a useful lens for understanding what each of them actually needs — and WAFER, a Forth that compiles to WebAssembly, is in a good position to explore that space.

WAFER (WebAssembly Forth Engine in Rust) JIT-compiles each Forth word to its own WASM module, linked through shared linear memory, globals, and a function table. It runs anywhere WASM runs: browsers, edge devices, servers, embedded systems. It has 160+ words, 100% Forth 2012 compliance on 10 word sets, and fits in ~50 KB. It has exception handling (CATCH/THROW), metaprogramming (DOES>), dynamic compilation (EVALUATE), and an optimization pipeline designed for stack-to-local promotion that can achieve 7x speedups.

This document explores what becomes possible when you take these properties seriously.


1. Data Analytics: Pipelines Without Plumbing

The Problem with Pipelines

Every data analytics framework reinvents the same idea: take data, push it through a sequence of transformations, collect the result. Pandas chains methods. Spark builds DAGs. dplyr pipes with %>%. Unix pipes bytes through |. They all converge on the same shape: linear composition of operations on an implicit data flow.

This is exactly what Forth does. It has done it since 1970. The data stack is the pipeline. Each word is a transformation. Composition is juxtaposition — you don't pipe, you don't chain, you don't bind. You just write the words next to each other.

\ Pandas: df['amount'].where(df['amount'] > 0).mean()
\ Forth:
: POSITIVE?  ( n -- n flag )  DUP 0> ;
: FILTER-POSITIVE  ( addr n -- addr' n' )
    0 >R  0 >R   \ count and sum accumulators on return stack
    0 DO
        DUP I CELLS + @
        POSITIVE? IF  R> + >R  R> 1+ >R  THEN
    LOOP DROP
    R> R>  \ ( sum count )
;
: MEAN  ( sum count -- avg )  / ;

data 100 FILTER-POSITIVE MEAN .

This goes a bit deeper than syntactic sugar. The absence of intermediate variables is a structural property. In a Pandas chain, every .method() returns a new DataFrame object that must be allocated, tracked, and eventually collected. In Forth, the data flows through the stack with zero allocation. The pipeline is the execution.

Streaming and Incremental Computation

The stack model is inherently streaming. A word consumes its inputs and produces its outputs in the same motion. There is no "collect all data first, then process" step unless you explicitly build one. This makes Forth natural for:

  • Event stream processing: each event lands on the stack, a word processes it, the result is consumed by the next word.
  • Incremental aggregation: running sums, counts, and statistics maintained on the return stack across invocations.
  • Windowed computation: a circular buffer in linear memory with stack-based access patterns.
\ Running average over a stream of values
VARIABLE running-sum
VARIABLE running-count

: UPDATE-AVG  ( new-value -- running-avg )
    running-sum @ +  DUP running-sum !
    running-count @ 1+  DUP running-count !
    /
;

\ Each incoming value:
42 UPDATE-AVG .    \ prints running average after adding 42
17 UPDATE-AVG .    \ prints updated average after adding 17

Client-Side Analytics via WASM

WAFER compiles to WebAssembly. This means analytics can run in the browser with no server round-trips. A user uploads a CSV, WAFER parses and processes it entirely client-side, and the results render immediately. No data leaves the machine. No API calls. No latency.

This isn't just a nice demo. For privacy-sensitive analytics (healthcare, finance, GDPR-regulated data), client-side processing can be a compliance requirement, not just a nice-to-have. WAFER's deterministic execution (no GC pauses, no background threads, fixed memory layout) makes it predictable enough for real-time dashboards.

Domain-Specific Languages

Forth's defining feature is that you build the language up to your problem. An analytics team doesn't write Forth — they write their DSL, which happens to be implemented in Forth:

\ Define a mini analytics vocabulary
: COLUMN  ( col# -- addr n )  table-base SWAP col-offset + col-length ;
: SUM     ( addr n -- total )  0 ROT ROT 0 DO  OVER I CELLS + @ +  LOOP NIP ;
: COUNT   ( addr n -- n )      NIP ;
: AVG     ( addr n -- avg )    2DUP SUM -ROT COUNT / ;
: WHERE>  ( addr n thresh -- addr' n' )  filter-gt ;

\ The analyst writes:
3 COLUMN  1000 WHERE>  AVG .
\ "Average of column 3 where values exceed 1000"

The DSL compiles to WASM through WAFER's IR pipeline. There is no interpreter overhead at query time. The analyst's vocabulary is the optimized code.

A Different Way to Look at It

Most languages treat the absence of named variables as a limitation. But in data pipelines, it can actually be a feature. Named intermediates create coupling points — places where code can refer to stale state, where refactoring requires renaming, where parallelization requires dependency analysis. Point-free composition through a stack sidesteps this whole class of problems. The data is always here, on top of the stack, ready for the next transformation.


2. Database Engine: The Query VM You Already Have

Databases Already Think in Stacks

SQLite — the most deployed database engine in the world — executes queries through the VDBE (Virtual Database Engine), a stack-based bytecode virtual machine. When you write SELECT * FROM users WHERE age > 30, SQLite's query planner compiles it into a sequence of stack operations: open cursor, seek, compare, jump, emit row.

PostgreSQL's executor runs a tree of plan nodes, each of which pushes tuples upward. MySQL's handler interface is a stack of operations. CockroachDB compiles SQL to a vectorized execution engine that operates on batches — but the control flow is still a stack of operators.

There's a pattern here: query execution engines tend to converge on stack machines. Forth just happens to already be one, with no extra abstraction layers in between.

Query Plans as Forth Programs

A SQL query plan is a tree. Flattened into execution order, it becomes a sequence of operations — which is exactly a Forth program:

SELECT name, salary FROM employees WHERE dept = 'ENG' AND salary > 100000;

The query plan, expressed as Forth:

\ Primitives provided by the storage engine
\ SCAN        ( table -- cursor )
\ NEXT-ROW    ( cursor -- cursor flag )  flag=true if row available
\ COL@        ( cursor col# -- value )
\ EMIT-ROW    ( v1 v2 -- )              send to result set
\ CLOSE       ( cursor -- )

: MATCH-DEPT?   ( cursor -- cursor flag )  DUP 2 COL@ S" ENG" COMPARE 0= ;
: MATCH-SAL?    ( cursor -- cursor flag )  DUP 3 COL@ 100000 > ;
: PROJECT       ( cursor -- )             DUP 0 COL@  OVER 3 COL@  EMIT-ROW ;

: QUERY  ( -- )
    employees SCAN
    BEGIN
        NEXT-ROW
    WHILE
        MATCH-DEPT? IF
            MATCH-SAL? IF
                PROJECT
            THEN
        THEN
    REPEAT
    CLOSE
;

This isn't just pseudocode, either. Every word here could be a real WAFER word backed by storage primitives implemented as host functions. The query compiles through WAFER's IR pipeline to native WASM, with the same optimization opportunities as any other Forth word: inlining, constant folding, dead code elimination.

EVALUATE as Dynamic Query Compilation

SQL databases accept queries as strings and compile them at runtime. Forth has EVALUATE, which does exactly the same thing — takes a string and compiles/executes it:

\ Build a query string dynamically
S" employees SCAN BEGIN NEXT-ROW WHILE MATCH-DEPT? IF PROJECT THEN REPEAT CLOSE"
EVALUATE

The difference from SQL: the "query language" and the "implementation language" are the same. There is no impedance mismatch between the language the user writes queries in and the language the engine executes them in. A user-defined function is just another word. An index lookup is just another word. A join strategy is just another word. They all compose the same way.

Linear Memory as Storage Pages

WAFER's linear memory model maps directly to how databases manage storage. A database page is a fixed-size block of bytes at a known offset — exactly what Forth's @ and ! operate on. B-tree nodes are structures in linear memory traversed by pointer arithmetic:

\ B-tree node layout:
\   +0: key count (cell)
\   +4: is-leaf flag (cell)
\   +8: keys array (key-count cells)
\   +8+4*key-count: child pointers (key-count+1 cells)

: NODE-KEYS    ( node -- addr )  8 + ;
: NODE-KEY@    ( node i -- key )  CELLS SWAP NODE-KEYS + @ ;
: NODE-CHILD@  ( node i -- child )
    OVER NODE-KEYS
    OVER @ CELLS +    \ skip past keys array
    SWAP CELLS + 4 +  \ index into children
    @
;

: BTREE-SEARCH  ( node target-key -- addr|0 )
    OVER @ 0= IF  2DROP 0  EXIT  THEN  \ empty node
    OVER 4 + @ IF                        \ leaf node
        LEAF-SEARCH
    ELSE
        INTERNAL-SEARCH                  \ recurse into child
    THEN
;

WASM Sandboxing for User-Defined Functions

Safely executing user-defined functions (UDFs) is one of the trickier problems in database engines. PostgreSQL UDFs in C can crash the server. JavaScript UDFs require embedding V8. Python UDFs tend to be slow.

WAFER UDFs compile to WASM and execute in a sandbox with bounded memory, bounded execution time, and no access to anything outside the linear memory they're given. A malicious UDF can't read other users' data, can't make network calls, can't crash the host. WAFER gets this for free — it's inherent to WASM's security model.

\ User defines a custom scoring function
: SCORE  ( age salary -- score )
    1000 /         \ salary contribution (salary/1000)
    SWAP 50 - ABS  \ age penalty (distance from 50)
    -              \ final score
;

\ Engine uses it in a query
: RANKED-QUERY  ( -- )
    employees SCAN
    BEGIN NEXT-ROW WHILE
        DUP 1 COL@  OVER 3 COL@  SCORE
        50 > IF  PROJECT  THEN
    REPEAT CLOSE
;

The SCORE function compiles to a WASM module through WAFER's JIT. It runs at near-native speed, sandboxed, with no FFI overhead.

A Different Way to Look at It

Database engineers put a lot of effort into building query VMs — designing bytecode formats, writing interpreters, adding JIT compilation. In a sense, they're often reinventing something Forth-shaped each time. It's worth asking: what if you just started with Forth and built the storage layer underneath it?


3. AI Inference: Neural Networks as Word Composition

Layers Are Words, Forward Pass Is Composition

A neural network's forward pass is a pipeline: input tensor enters, passes through a sequence of layers (linear transform, activation, normalization), and a prediction exits. Each layer takes a tensor and produces a tensor.

In Forth terms: each layer is a word. The tensor sits on the stack. The forward pass is the composition of those words:

\ Assuming tensor operations as primitives (host functions):
\ T-MATMUL  ( tensor weights -- tensor )
\ T-ADD     ( tensor bias -- tensor )
\ T-RELU    ( tensor -- tensor )
\ T-SOFTMAX ( tensor -- tensor )

: LINEAR1    ( tensor -- tensor )  w1 T-MATMUL b1 T-ADD ;
: LINEAR2    ( tensor -- tensor )  w2 T-MATMUL b2 T-ADD ;
: LINEAR3    ( tensor -- tensor )  w3 T-MATMUL b3 T-ADD ;

: CLASSIFIER ( tensor -- tensor )
    LINEAR1 T-RELU
    LINEAR2 T-RELU
    LINEAR3 T-SOFTMAX
;

input-data CLASSIFIER  \ forward pass

This maps more directly than you might expect. The compositional structure of neural networks lines up nicely with the compositional structure of Forth programs. The stack carries the data flow. The words are the layers. The dictionary holds the model architecture.

Quantized Inference on the Integer Stack

Most production inference runs quantized — INT8 or INT4 weights, integer arithmetic, no floating point. Forth's native data type is the integer cell. WAFER's i32 stack operations map directly to quantized tensor operations:

\ INT8 quantized dot product of two vectors
: QDOT  ( addr1 addr2 n -- result )
    0 >R                         \ accumulator on return stack
    0 DO
        OVER I + C@  127 -       \ load and de-bias first element
        OVER I + C@  127 -       \ load and de-bias second element
        *  R> + >R               \ multiply-accumulate
    LOOP
    2DROP R>
;

\ Quantized linear layer
: QLINEAR  ( input-addr weight-addr rows cols -- output-addr )
    \ For each output neuron, compute QDOT with input
    output-buf >R
    0 DO
        2DUP I row-offset + SWAP QDOT
        R@ I CELLS + !
    LOOP
    2DROP R>
;

No framework dependency, no Python interpreter, no CUDA runtime — just integer arithmetic on a stack, compiled to WASM, running on any device.

Edge AI: The 50 KB Runtime

ML inference frameworks tend to be big. PyTorch is ~500 MB. TensorFlow Lite is ~1 MB for the runtime alone. ONNX Runtime is ~10 MB.

WAFER is ~50 KB for the full Forth system. The model weights dominate the binary size, not the runtime. For edge devices — IoT sensors, wearables, microcontrollers, browser tabs — that size difference can be the difference between "fits" and "doesn't fit."

WASM's portability means the same inference code runs on an ARM microcontroller, in a browser, on a server, without recompilation. Write the model once in Forth, deploy everywhere WASM reaches.

DOES> for Architecture Generation

Forth's DOES> is a metaprogramming facility: it creates words that create other words, each with custom runtime behavior. This is exactly what neural architecture construction needs:

\ LAYER is a defining word that creates layer words
: LAYER  ( weights bias rows cols -- )
    CREATE  , , , ,           \ store dimensions and pointers
    DOES>   ( tensor -- tensor )
        DUP >R                \ save parameter field address
        R@ @  R@ 4 + @       \ get cols, rows
        R@ 8 + @              \ get weights address
        T-MATMUL
        R> 12 + @             \ get bias address
        T-ADD
;

\ Define the network architecture
w1 b1 768 512  LAYER EMBED
w2 b2 512 256  LAYER HIDDEN1
w3 b3 256 10   LAYER OUTPUT

\ The architecture is now executable
: MODEL  ( tensor -- tensor )  EMBED T-RELU HIDDEN1 T-RELU OUTPUT T-SOFTMAX ;

Each LAYER invocation creates a new word with its own weights and dimensions baked in. The MODEL word composes them. This is the same pattern as nn.Sequential in PyTorch — but it compiles to WASM, has zero framework overhead, and the "architecture definition" and the "executable model" are the same thing.

Automatic Differentiation via Dual Numbers

Backpropagation is reverse-mode automatic differentiation. There is an elegant formulation using dual numbers (a value paired with its derivative) that maps to Forth's double-cell operations:

\ A dual number is a pair ( value derivative ) stored as a double cell
\ WAFER's double-cell words (D+, D-, D*, 2DUP, etc.) operate on these natively

\ Dual addition: (a, a') + (b, b') = (a+b, a'+b')
: D+DUAL  ( a a' b b' -- a+b a'+b' )
    ROT +          \ a' + b'
    >R + R>        \ a + b, then restore derivative
;

\ Dual multiplication: (a, a') * (b, b') = (a*b, a*b' + a'*b)
: D*DUAL  ( a a' b b' -- a*b a*b'+a'*b )
    3 PICK *       \ a * b'
    >R
    ROT *          \ a' * b
    R> +           \ a*b' + a'*b = derivative
    >R
    *              \ a * b = value
    R>
;

The chain rule emerges naturally: composing dual-number operations through a sequence of words automatically computes the derivative of the whole pipeline. This is the same principle behind JAX's jvp — but expressed as stack operations.

A Different Way to Look at It

Most of the ML ecosystem's complexity lives in training. Inference, by comparison, is fairly straightforward: load weights, multiply matrices, apply activations, read output. That's a pipeline of arithmetic operations — which is pretty much what Forth was designed for. The industry tends to wrap inference in 500 MB frameworks because training needed those frameworks, and the two haven't been fully separated. A 50 KB Forth runtime doing quantized integer operations might be closer to what inference actually needs than we usually assume.


4. AI Generating Code: The Smallest Target Language

The Token Economy

When an LLM generates code, every token costs money and adds latency. A Python solution to "compute the average of a list" looks like:

def average(numbers):
    if not numbers:
        return 0
    return sum(numbers) / len(numbers)

That is 25 tokens. The Forth equivalent:

: AVERAGE  ( addr n -- avg )  2DUP SUM -ROT NIP / ;

That is 12 tokens. For the same semantic content, Forth uses roughly half the tokens. At scale — millions of API calls, each generating hundreds of lines — this is a meaningful cost reduction. But the token savings are the least interesting advantage.

Minimal Syntax, Maximal Verifiability

Forth has essentially no syntax. There are words separated by spaces. There are numbers. There are a few special constructs (: for definitions, IF /THEN for conditionals, DO/LOOP for iteration). That's about it.

An LLM generating Python must get indentation right, match parentheses and brackets, handle keyword arguments, manage import statements, respect method resolution order, and navigate a standard library of thousands of functions. An LLM generating Forth mostly just needs to get the stack effect right. That's the main failure mode worth worrying about.

And stack effects are mechanically verifiable:

\ Stack effect: ( n1 n2 -- n3 )
\ Verification: start with 2 items on stack, end with 1
: ADD-AND-DOUBLE  ( n1 n2 -- n3 )  + 2* ;

\ Test:
3 4 ADD-AND-DOUBLE   \ stack should contain: 14

You don't need a type checker or static analysis. Just run the word with known inputs and check the stack. If the stack depth and values match the declared effect, the word is correct. It's hard to think of another practical language where verification is this straightforward.

Self-Extending Vocabulary

LLMs struggle with large codebases because context windows are finite. A Python project with 50 files and 10,000 lines requires the LLM to hold (or retrieve) vast amounts of context to generate correct code.

Forth's defining characteristic is that you build the language up to your problem. The LLM doesn't need to generate a 100-line solution. It generates 5-line words, each building on the previous ones:

\ Step 1: LLM generates basic operations
: CLAMP  ( n lo hi -- n' )  ROT MIN MAX ;
: BETWEEN?  ( n lo hi -- flag )  OVER - >R - R> U< ;

\ Step 2: LLM generates higher-level operations using step 1
: NORMALIZE  ( n -- n' )  0 255 CLAMP ;
: IN-RANGE?  ( n -- flag )  0 100 BETWEEN? ;

\ Step 3: LLM generates application logic using steps 1-2
: PROCESS-SENSOR  ( raw -- calibrated )
    offset @ -          \ remove sensor offset
    NORMALIZE            \ clamp to valid range
    scale @ *  1000 /    \ apply calibration scale
;

Each step requires only the names of previously defined words, not their implementations. The dictionary serves as a compressed representation of the entire program. An LLM can generate correct code by knowing only the word names and their stack effects — a few dozen tokens of context instead of thousands of lines.

WASM Sandbox: Safe Execution of Untrusted Code

AI-generated code generally needs to be executed to be verified. Running arbitrary Python is tricky from a security perspective — file system access, network calls, import os, eval(). Sandboxing Python typically requires containerization, seccomp filters, or virtual machines.

WAFER compiles to WASM, which executes in a sandbox by construction. A WAFER program:

  • Cannot access the file system
  • Cannot make network calls
  • Cannot read memory outside its linear memory
  • Cannot execute longer than the host allows (fuel metering)
  • Cannot consume more memory than the host allocates

You can run AI-generated Forth with roughly the same confidence as a pure mathematical function. The sandbox isn't a bolt-on — it's just how WASM works.

\ AI generates this code. Is it safe to run? Yes, always.
: FIBONACCI  ( n -- fib )
    DUP 2 < IF EXIT THEN
    DUP 1- RECURSE
    SWAP 2 - RECURSE
    +
;

There's nothing this word can do except compute. No side effects, no escape hatches. The WASM sandbox guarantees that structurally.

A Different Way to Look at It

The conventional wisdom is that LLMs need expressive, high-level languages to generate useful code. But there's a good case for the opposite: what LLMs really benefit from are verifiable languages — ones where correctness can be checked cheaply and deterministically. Expressiveness can actually work against you here: more syntax means more ways to be wrong, more edge cases to handle, more context to maintain. Forth's extreme minimalism starts to look less like a limitation and more like an advantage: generate a few small words, verify each one by running it, compose them into larger programs with confidence. The language that's hardest for humans to read might just be the easiest for machines to write correctly.


5. AI Agent Control: Plans That Execute Themselves

The Plan-Program Gap

When an AI agent "plans," it produces a sequence of steps in natural language:

  1. Search for files matching "*.config"
  2. Read each file and extract the "timeout" field
  3. If timeout > 30, update it to 30
  4. Write the modified files back

This plan is then "executed" by the agent interpreting each step, calling tools, handling errors, and managing state — all mediated by the LLM at every step, consuming tokens and latency for what is fundamentally a sequential program.

The gap between "plan" and "program" might be more artificial than it looks. A plan is a program — we just don't usually give agents a good executable representation for it.

Forth could be that representation.

Tools as Words

Every agent tool — file read, web search, code execution, API call — maps to a Forth word. The agent's toolkit becomes a Forth dictionary:

\ Agent tool vocabulary (host functions)
\ SEARCH-FILES  ( pattern-addr pattern-len -- results-addr count )
\ READ-FILE     ( path-addr path-len -- content-addr content-len )
\ WRITE-FILE    ( content-addr content-len path-addr path-len -- )
\ JSON-GET      ( json-addr key-addr key-len -- value-addr value-len )
\ SHELL         ( cmd-addr cmd-len -- output-addr output-len )
\ ASK-USER      ( question-addr question-len -- answer-addr answer-len )

Now the plan from above becomes an executable program:

: UPDATE-TIMEOUTS  ( -- )
    S" *.config" SEARCH-FILES       \ get matching files
    0 DO                             \ for each file
        DUP I CELLS + @ COUNT        \ get filename
        2DUP READ-FILE               \ read contents
        S" timeout" JSON-GET         \ extract timeout field
        S>NUMBER DROP                \ convert to number
        30 > IF                      \ if timeout > 30
            30 SET-TIMEOUT           \ update to 30
            WRITE-FILE               \ write back
        ELSE
            2DROP                    \ discard unchanged
        THEN
    LOOP
    DROP
;

UPDATE-TIMEOUTS

This isn't a description of what to do — it is what to do. The agent generates it, WAFER compiles it to WASM, and it runs — no LLM in the loop during execution, no token cost per step, no latency per tool call.

Error Handling with CATCH/THROW

Of course, agent plans fail. Files don't exist. APIs return errors. Permissions get denied. Production agent systems need robust error handling, which typically means calling the LLM at every step to decide what to do when something goes wrong.

WAFER has CATCH and THROW — structured exception handling that lets the plan itself define error recovery:

: SAFE-READ  ( path-addr path-len -- content-addr content-len | 0 0 )
    ['] READ-FILE CATCH IF
        2DROP  0 0                   \ file not found: return empty
    THEN
;

: SAFE-UPDATE  ( filename-addr filename-len -- )
    2DUP SAFE-READ                   \ try to read
    DUP 0= IF  2DROP 2DROP EXIT THEN \ skip if file missing
    S" timeout" JSON-GET
    S>NUMBER DROP
    30 > IF
        30 SET-TIMEOUT
        WRITE-FILE
    ELSE
        2DROP 2DROP
    THEN
;

: ROBUST-UPDATE-TIMEOUTS  ( -- )
    S" *.config" SEARCH-FILES
    0 DO
        DUP I CELLS + @ COUNT SAFE-UPDATE
    LOOP
    DROP
;

The error handling is part of the plan. The agent generates it once, and it runs to completion without further LLM intervention. Errors are handled at the speed of WASM, not the speed of an API call to an LLM.

The Dictionary as Growing Capability

A human Forth programmer builds up vocabulary: small words compose into larger words, which compose into still larger words. The dictionary grows with the programmer's understanding of the problem.

An AI agent does the same thing. Each successfully executed plan leaves behind defined words that can be reused:

\ First task: agent learns to read configs
: READ-CONFIG  ( path-addr path-len -- json-addr json-len )
    SAFE-READ DUP 0= IF EXIT THEN JSON-PARSE ;

\ Second task: agent learns to update configs
: UPDATE-CONFIG  ( key-addr key-len value path-addr path-len -- )
    2DUP READ-CONFIG JSON-SET WRITE-FILE ;

\ Third task: agent composes previous capabilities
: MIGRATE-CONFIGS  ( -- )
    S" *.config" SEARCH-FILES
    0 DO
        DUP I CELLS + @ COUNT
        S" timeout" 30 ROT ROT UPDATE-CONFIG
    LOOP DROP
;

\ The agent's vocabulary grows with experience.
\ MIGRATE-CONFIGS didn't exist before. Now it does.
\ Next time, the agent can use it as a building block.

You could call this learned tool use — not in the machine learning sense, but in the software engineering sense. The agent defines new capabilities in terms of old ones, and the dictionary persists across invocations. Over time, the agent's vocabulary naturally converges on the abstractions that matter for its operational domain.

REPL as Test-Before-Commit

Agents that act irreversibly on the first try are risky. WAFER's REPL model gives agents a natural test-before-commit workflow:

  1. Define: Generate and compile the plan as Forth words.
  2. Test: Run the words against sample data on the stack.
  3. Verify: Check the stack for expected results.
  4. Execute: Run the plan for real only after verification passes.
\ Step 1: Define
: CALCULATE-DISCOUNT  ( price tier -- discounted )
    CASE
        1 OF  10 ENDOF   \ tier 1: 10% off
        2 OF  20 ENDOF   \ tier 2: 20% off
        3 OF  35 ENDOF   \ tier 3: 35% off
        0 SWAP
    ENDCASE
    100 SWAP - * 100 /
;

\ Step 2: Test (no side effects, just stack operations)
1000 1 CALCULATE-DISCOUNT .  \ expect 900
1000 2 CALCULATE-DISCOUNT .  \ expect 800
1000 3 CALCULATE-DISCOUNT .  \ expect 650

\ Step 3: Verify output matches expectations
\ Step 4: Apply to real data only after tests pass

The agent can generate, test, and iterate without ever touching production data. The REPL isn't just a debugging convenience here — it's a safety mechanism for autonomous agents.

Multi-Agent Coordination

Multiple agents can share a WAFER dictionary through shared linear memory. One agent defines words. Another agent uses them. A coordinator agent composes them into higher-level plans:

\ Agent A defines data retrieval
: FETCH-METRICS  ( -- addr n )  metrics-api QUERY PARSE-JSON ;

\ Agent B defines analysis
: DETECT-ANOMALIES  ( addr n -- anomalies-addr n )
    THRESHOLD @  FILTER-ABOVE ;

\ Agent C defines actions
: ALERT  ( anomalies-addr n -- )
    0 DO  DUP I CELLS + @  SEND-ALERT  LOOP DROP ;

\ Coordinator composes them
: MONITOR  ( -- )
    BEGIN
        FETCH-METRICS DETECT-ANOMALIES
        DUP 0> IF  ALERT  ELSE  DROP  THEN
        60000 DELAY
    AGAIN
;

Each agent contributes words to a shared vocabulary. The coordinator doesn't need to understand the implementation of FETCH-METRICS or DETECT-ANOMALIES — it only needs to know their stack effects. This is composability without coupling, coordination without shared state beyond the dictionary.

A Different Way to Look at It

The AI agent community is building increasingly sophisticated "plan representations" — DAGs, state machines, behavior trees, ReAct loops — all trying to bridge the gap between the LLM's natural language output and actual tool execution. But Forth is already a plan representation that doubles as an execution engine. It has structured control flow (IF/THEN, DO/LOOP, BEGIN/UNTIL), error handling (CATCH/THROW), composability (word definitions), and a test harness (the REPL and stack). Maybe the gap between "plan" and "program" doesn't need to be bridged so much as it needs to be erased.


Convergence: Five Problems, One Shape

These five domains look different on the surface:

Domain Traditional Tool Core Operation
Data analytics Pandas, Spark Transform pipeline
Database engine SQLite VDBE, Postgres executor Query plan execution
AI inference PyTorch, TensorFlow Layer composition
AI codegen Python, JavaScript Program synthesis
AI agents LangChain, CrewAI Plan execution

But they share a deep structure: sequential composition of simple operations on a data flow. A data pipeline, a query plan, a forward pass, a synthesized program, and an agent plan are all the same thing: a sequence of words applied to a stack.

Forth noticed this in 1970. Charles Moore designed a language around the observation that most computation is a pipeline of transformations, and the simplest way to express pipelines is sequential composition on a stack. The language has no syntax because pipelines don't need syntax. It has no type system because the data flow is the type. It has no package manager because each program builds its own vocabulary from primitives.

WAFER brings these ideas to the modern world by targeting WebAssembly — the universal runtime that runs in browsers, on servers, on edge devices, in sandboxes. That combination opens up some interesting possibilities:

  • Analytics in the browser with no server, no framework, deterministic execution.
  • Database VMs that compile queries to native WASM through an existing Forth JIT.
  • Inference engines that fit in 50 KB and run on any device WASM reaches.
  • AI-generated code in the language with the smallest syntax, cheapest verification, and safest sandbox.
  • Agent plans that are executable programs, testable in a REPL, composable through a growing dictionary.

None of this requires Forth to change. Forth has been this shape for 55 years. It's kind of fun that the world's problems seem to be circling back to it.


WAFER is open source. Start at the repository root. Architecture details: WAFER.md. Language introduction: FORTH.md.