eb79c40c69
Separate float stack with fsp global, IEEE 754 double precision. Stack ops: FDROP FDUP FSWAP FOVER FROT FDEPTH Arithmetic: F+ F- F* F/ FNEGATE FABS FMAX FMIN FSQRT FLOOR FROUND F** Comparisons: F0= F0< F= F< F~ Memory: F@ F! SF@ SF! DF@ DF! FLOAT+ FLOATS FALIGNED FALIGN Conversions: D>F F>D S>F F>S Trig: FSIN FCOS FTAN FASIN FACOS FATAN FATAN2 FSINCOS Exp/Log: FEXP FEXPM1 FLN FLNP1 FLOG FALOG Hyperbolic: FSINH FCOSH FTANH FASINH FACOSH FATANH I/O: F. FE. FS. REPRESENT >FLOAT PRECISION SET-PRECISION Defining: FVARIABLE FCONSTANT FVALUE FLITERAL Float literal parsing (1E, 1.5E2, -3.14E0 format) 299 unit tests + 11 compliance tests, 0 errors on float test suite
891 lines
32 KiB
Markdown
891 lines
32 KiB
Markdown
# The Unreasonable Effectiveness of Stack Machines
|
|
|
|
_How Forth — and WAFER — can serve as infrastructure for data analytics,
|
|
databases, AI inference, AI code generation, and AI agent control._
|
|
|
|
---
|
|
|
|
Forth is 55 years old. It has no type system, no garbage collector, no package
|
|
manager, no syntax to speak of. By most conventional measures, it shouldn't
|
|
still be relevant.
|
|
|
|
But it keeps showing up at the edges — in firmware, in space probes, in
|
|
real-time systems, in places where correctness and determinism matter more than
|
|
developer ergonomics. That's worth paying attention to.
|
|
|
|
The properties that make Forth unusual — concatenative composition, zero-cost
|
|
abstraction through word definition, a stack-based execution model that maps
|
|
directly to hardware — happen to line up surprisingly well with what five of
|
|
the most active areas in modern computing are independently reaching for:
|
|
|
|
1. **Data analytics** wants composable, streaming pipelines.
|
|
2. **Database engines** want stack-based virtual machines for query execution.
|
|
3. **AI inference** wants tiny, deterministic, embeddable runtimes.
|
|
4. **AI code generation** wants the smallest possible target language.
|
|
5. **AI agent systems** want plans that are also executable programs.
|
|
|
|
Forth won't single-handedly solve any of these. But it offers a useful lens
|
|
for understanding what each of them actually needs — and WAFER, a Forth that
|
|
compiles to WebAssembly, is in a good position to explore that space.
|
|
|
|
WAFER (WebAssembly Forth Engine in Rust) JIT-compiles each Forth word to its
|
|
own WASM module, linked through shared linear memory, globals, and a function
|
|
table. It runs anywhere WASM runs: browsers, edge devices, servers, embedded
|
|
systems. It has 160+ words, 100% Forth 2012 compliance on 10 word sets, and
|
|
fits in ~50 KB. It has exception handling (`CATCH`/`THROW`), metaprogramming
|
|
(`DOES>`), dynamic compilation (`EVALUATE`), and an optimization pipeline
|
|
designed for stack-to-local promotion that can achieve 7x speedups.
|
|
|
|
This document explores what becomes possible when you take these properties
|
|
seriously.
|
|
|
|
---
|
|
|
|
## 1. Data Analytics: Pipelines Without Plumbing
|
|
|
|
### The Problem with Pipelines
|
|
|
|
Every data analytics framework reinvents the same idea: take data, push it
|
|
through a sequence of transformations, collect the result. Pandas chains
|
|
methods. Spark builds DAGs. dplyr pipes with `%>%`. Unix pipes bytes through
|
|
`|`. They all converge on the same shape: **linear composition of operations
|
|
on an implicit data flow**.
|
|
|
|
This is exactly what Forth does. It has done it since 1970. The data stack
|
|
_is_ the pipeline. Each word _is_ a transformation. Composition is
|
|
juxtaposition — you don't pipe, you don't chain, you don't bind. You just
|
|
write the words next to each other.
|
|
|
|
```forth
|
|
\ Pandas: df['amount'].where(df['amount'] > 0).mean()
|
|
\ Forth:
|
|
: POSITIVE? ( n -- n flag ) DUP 0> ;
|
|
: FILTER-POSITIVE ( addr n -- addr' n' )
|
|
0 >R 0 >R \ count and sum accumulators on return stack
|
|
0 DO
|
|
DUP I CELLS + @
|
|
POSITIVE? IF R> + >R R> 1+ >R THEN
|
|
LOOP DROP
|
|
R> R> \ ( sum count )
|
|
;
|
|
: MEAN ( sum count -- avg ) / ;
|
|
|
|
data 100 FILTER-POSITIVE MEAN .
|
|
```
|
|
|
|
This goes a bit deeper than syntactic sugar. The absence of intermediate
|
|
variables is a structural property. In a Pandas chain, every `.method()`
|
|
returns a new DataFrame object that must be allocated, tracked, and eventually
|
|
collected. In Forth, the data flows through the stack with zero allocation.
|
|
The pipeline _is_ the execution.
|
|
|
|
### Streaming and Incremental Computation
|
|
|
|
The stack model is inherently streaming. A word consumes its inputs and
|
|
produces its outputs in the same motion. There is no "collect all data first,
|
|
then process" step unless you explicitly build one. This makes Forth natural
|
|
for:
|
|
|
|
- **Event stream processing**: each event lands on the stack, a word
|
|
processes it, the result is consumed by the next word.
|
|
- **Incremental aggregation**: running sums, counts, and statistics
|
|
maintained on the return stack across invocations.
|
|
- **Windowed computation**: a circular buffer in linear memory with
|
|
stack-based access patterns.
|
|
|
|
```forth
|
|
\ Running average over a stream of values
|
|
VARIABLE running-sum
|
|
VARIABLE running-count
|
|
|
|
: UPDATE-AVG ( new-value -- running-avg )
|
|
running-sum @ + DUP running-sum !
|
|
running-count @ 1+ DUP running-count !
|
|
/
|
|
;
|
|
|
|
\ Each incoming value:
|
|
42 UPDATE-AVG . \ prints running average after adding 42
|
|
17 UPDATE-AVG . \ prints updated average after adding 17
|
|
```
|
|
|
|
### Client-Side Analytics via WASM
|
|
|
|
WAFER compiles to WebAssembly. This means analytics can run _in the browser_
|
|
with no server round-trips. A user uploads a CSV, WAFER parses and processes
|
|
it entirely client-side, and the results render immediately. No data leaves
|
|
the machine. No API calls. No latency.
|
|
|
|
This isn't just a nice demo. For privacy-sensitive analytics (healthcare,
|
|
finance, GDPR-regulated data), client-side processing can be a compliance
|
|
requirement, not just a nice-to-have. WAFER's deterministic execution (no GC
|
|
pauses, no background threads, fixed memory layout) makes it predictable
|
|
enough for real-time dashboards.
|
|
|
|
### Domain-Specific Languages
|
|
|
|
Forth's defining feature is that you build the language up to your problem.
|
|
An analytics team doesn't write Forth — they write _their DSL_, which
|
|
happens to be implemented in Forth:
|
|
|
|
```forth
|
|
\ Define a mini analytics vocabulary
|
|
: COLUMN ( col# -- addr n ) table-base SWAP col-offset + col-length ;
|
|
: SUM ( addr n -- total ) 0 ROT ROT 0 DO OVER I CELLS + @ + LOOP NIP ;
|
|
: COUNT ( addr n -- n ) NIP ;
|
|
: AVG ( addr n -- avg ) 2DUP SUM -ROT COUNT / ;
|
|
: WHERE> ( addr n thresh -- addr' n' ) filter-gt ;
|
|
|
|
\ The analyst writes:
|
|
3 COLUMN 1000 WHERE> AVG .
|
|
\ "Average of column 3 where values exceed 1000"
|
|
```
|
|
|
|
The DSL compiles to WASM through WAFER's IR pipeline. There is no
|
|
interpreter overhead at query time. The analyst's vocabulary _is_ the
|
|
optimized code.
|
|
|
|
### A Different Way to Look at It
|
|
|
|
Most languages treat the absence of named variables as a limitation. But in
|
|
data pipelines, it can actually be a **feature**. Named intermediates create
|
|
coupling points — places where code can refer to stale state, where
|
|
refactoring requires renaming, where parallelization requires dependency
|
|
analysis. Point-free composition through a stack sidesteps this whole class
|
|
of problems. The data is always _here_, on top of the stack, ready for the
|
|
next transformation.
|
|
|
|
---
|
|
|
|
## 2. Database Engine: The Query VM You Already Have
|
|
|
|
### Databases Already Think in Stacks
|
|
|
|
SQLite — the most deployed database engine in the world — executes queries
|
|
through the VDBE (Virtual Database Engine), a stack-based bytecode virtual
|
|
machine. When you write `SELECT * FROM users WHERE age > 30`, SQLite's query
|
|
planner compiles it into a sequence of stack operations: open cursor, seek,
|
|
compare, jump, emit row.
|
|
|
|
PostgreSQL's executor runs a tree of plan nodes, each of which pushes tuples
|
|
upward. MySQL's handler interface is a stack of operations. CockroachDB
|
|
compiles SQL to a vectorized execution engine that operates on batches — but
|
|
the control flow is still a stack of operators.
|
|
|
|
There's a pattern here: **query execution engines tend to converge on
|
|
stack machines**. Forth just happens to already be one, with no extra
|
|
abstraction layers in between.
|
|
|
|
### Query Plans as Forth Programs
|
|
|
|
A SQL query plan is a tree. Flattened into execution order, it becomes a
|
|
sequence of operations — which is exactly a Forth program:
|
|
|
|
```sql
|
|
SELECT name, salary FROM employees WHERE dept = 'ENG' AND salary > 100000;
|
|
```
|
|
|
|
The query plan, expressed as Forth:
|
|
|
|
```forth
|
|
\ Primitives provided by the storage engine
|
|
\ SCAN ( table -- cursor )
|
|
\ NEXT-ROW ( cursor -- cursor flag ) flag=true if row available
|
|
\ COL@ ( cursor col# -- value )
|
|
\ EMIT-ROW ( v1 v2 -- ) send to result set
|
|
\ CLOSE ( cursor -- )
|
|
|
|
: MATCH-DEPT? ( cursor -- cursor flag ) DUP 2 COL@ S" ENG" COMPARE 0= ;
|
|
: MATCH-SAL? ( cursor -- cursor flag ) DUP 3 COL@ 100000 > ;
|
|
: PROJECT ( cursor -- ) DUP 0 COL@ OVER 3 COL@ EMIT-ROW ;
|
|
|
|
: QUERY ( -- )
|
|
employees SCAN
|
|
BEGIN
|
|
NEXT-ROW
|
|
WHILE
|
|
MATCH-DEPT? IF
|
|
MATCH-SAL? IF
|
|
PROJECT
|
|
THEN
|
|
THEN
|
|
REPEAT
|
|
CLOSE
|
|
;
|
|
```
|
|
|
|
This isn't just pseudocode, either. Every word here could be a real WAFER
|
|
word backed by storage primitives implemented as host functions. The query
|
|
compiles through WAFER's IR pipeline to native WASM, with the same
|
|
optimization opportunities as any other Forth word: inlining, constant
|
|
folding, dead code elimination.
|
|
|
|
### EVALUATE as Dynamic Query Compilation
|
|
|
|
SQL databases accept queries as strings and compile them at runtime. Forth
|
|
has `EVALUATE`, which does exactly the same thing — takes a string and
|
|
compiles/executes it:
|
|
|
|
```forth
|
|
\ Build a query string dynamically
|
|
S" employees SCAN BEGIN NEXT-ROW WHILE MATCH-DEPT? IF PROJECT THEN REPEAT CLOSE"
|
|
EVALUATE
|
|
```
|
|
|
|
The difference from SQL: the "query language" and the "implementation
|
|
language" are the same. There is no impedance mismatch between the language
|
|
the user writes queries in and the language the engine executes them in. A
|
|
user-defined function is just another word. An index lookup is just another
|
|
word. A join strategy is just another word. They all compose the same way.
|
|
|
|
### Linear Memory as Storage Pages
|
|
|
|
WAFER's linear memory model maps directly to how databases manage storage.
|
|
A database page is a fixed-size block of bytes at a known offset — exactly
|
|
what Forth's `@` and `!` operate on. B-tree nodes are structures in linear
|
|
memory traversed by pointer arithmetic:
|
|
|
|
```forth
|
|
\ B-tree node layout:
|
|
\ +0: key count (cell)
|
|
\ +4: is-leaf flag (cell)
|
|
\ +8: keys array (key-count cells)
|
|
\ +8+4*key-count: child pointers (key-count+1 cells)
|
|
|
|
: NODE-KEYS ( node -- addr ) 8 + ;
|
|
: NODE-KEY@ ( node i -- key ) CELLS SWAP NODE-KEYS + @ ;
|
|
: NODE-CHILD@ ( node i -- child )
|
|
OVER NODE-KEYS
|
|
OVER @ CELLS + \ skip past keys array
|
|
SWAP CELLS + 4 + \ index into children
|
|
@
|
|
;
|
|
|
|
: BTREE-SEARCH ( node target-key -- addr|0 )
|
|
OVER @ 0= IF 2DROP 0 EXIT THEN \ empty node
|
|
OVER 4 + @ IF \ leaf node
|
|
LEAF-SEARCH
|
|
ELSE
|
|
INTERNAL-SEARCH \ recurse into child
|
|
THEN
|
|
;
|
|
```
|
|
|
|
### WASM Sandboxing for User-Defined Functions
|
|
|
|
Safely executing user-defined functions (UDFs) is one of the trickier
|
|
problems in database engines. PostgreSQL UDFs in C can crash the server.
|
|
JavaScript UDFs require embedding V8. Python UDFs tend to be slow.
|
|
|
|
WAFER UDFs compile to WASM and execute in a sandbox with bounded memory,
|
|
bounded execution time, and no access to anything outside the linear memory
|
|
they're given. A malicious UDF can't read other users' data, can't make
|
|
network calls, can't crash the host. WAFER gets this for free — it's
|
|
inherent to WASM's security model.
|
|
|
|
```forth
|
|
\ User defines a custom scoring function
|
|
: SCORE ( age salary -- score )
|
|
1000 / \ salary contribution (salary/1000)
|
|
SWAP 50 - ABS \ age penalty (distance from 50)
|
|
- \ final score
|
|
;
|
|
|
|
\ Engine uses it in a query
|
|
: RANKED-QUERY ( -- )
|
|
employees SCAN
|
|
BEGIN NEXT-ROW WHILE
|
|
DUP 1 COL@ OVER 3 COL@ SCORE
|
|
50 > IF PROJECT THEN
|
|
REPEAT CLOSE
|
|
;
|
|
```
|
|
|
|
The `SCORE` function compiles to a WASM module through WAFER's JIT. It runs
|
|
at near-native speed, sandboxed, with no FFI overhead.
|
|
|
|
### A Different Way to Look at It
|
|
|
|
Database engineers put a lot of effort into building query VMs — designing
|
|
bytecode formats, writing interpreters, adding JIT compilation. In a sense,
|
|
they're often reinventing something Forth-shaped each time. It's worth asking:
|
|
what if you just started with Forth and built the storage layer underneath it?
|
|
|
|
---
|
|
|
|
## 3. AI Inference: Neural Networks as Word Composition
|
|
|
|
### Layers Are Words, Forward Pass Is Composition
|
|
|
|
A neural network's forward pass is a pipeline: input tensor enters, passes
|
|
through a sequence of layers (linear transform, activation, normalization),
|
|
and a prediction exits. Each layer takes a tensor and produces a tensor.
|
|
|
|
In Forth terms: each layer is a word. The tensor sits on the stack. The
|
|
forward pass is the composition of those words:
|
|
|
|
```forth
|
|
\ Assuming tensor operations as primitives (host functions):
|
|
\ T-MATMUL ( tensor weights -- tensor )
|
|
\ T-ADD ( tensor bias -- tensor )
|
|
\ T-RELU ( tensor -- tensor )
|
|
\ T-SOFTMAX ( tensor -- tensor )
|
|
|
|
: LINEAR1 ( tensor -- tensor ) w1 T-MATMUL b1 T-ADD ;
|
|
: LINEAR2 ( tensor -- tensor ) w2 T-MATMUL b2 T-ADD ;
|
|
: LINEAR3 ( tensor -- tensor ) w3 T-MATMUL b3 T-ADD ;
|
|
|
|
: CLASSIFIER ( tensor -- tensor )
|
|
LINEAR1 T-RELU
|
|
LINEAR2 T-RELU
|
|
LINEAR3 T-SOFTMAX
|
|
;
|
|
|
|
input-data CLASSIFIER \ forward pass
|
|
```
|
|
|
|
This maps more directly than you might expect. The compositional structure of
|
|
neural networks lines up nicely with the compositional structure of Forth
|
|
programs. The stack carries the data flow. The words are the layers. The
|
|
dictionary holds the model architecture.
|
|
|
|
### Quantized Inference on the Integer Stack
|
|
|
|
Most production inference runs quantized — INT8 or INT4 weights, integer
|
|
arithmetic, no floating point. Forth's native data type is the integer cell.
|
|
WAFER's `i32` stack operations map directly to quantized tensor operations:
|
|
|
|
```forth
|
|
\ INT8 quantized dot product of two vectors
|
|
: QDOT ( addr1 addr2 n -- result )
|
|
0 >R \ accumulator on return stack
|
|
0 DO
|
|
OVER I + C@ 127 - \ load and de-bias first element
|
|
OVER I + C@ 127 - \ load and de-bias second element
|
|
* R> + >R \ multiply-accumulate
|
|
LOOP
|
|
2DROP R>
|
|
;
|
|
|
|
\ Quantized linear layer
|
|
: QLINEAR ( input-addr weight-addr rows cols -- output-addr )
|
|
\ For each output neuron, compute QDOT with input
|
|
output-buf >R
|
|
0 DO
|
|
2DUP I row-offset + SWAP QDOT
|
|
R@ I CELLS + !
|
|
LOOP
|
|
2DROP R>
|
|
;
|
|
```
|
|
|
|
No framework dependency, no Python interpreter, no CUDA runtime — just
|
|
integer arithmetic on a stack, compiled to WASM, running on any device.
|
|
|
|
### Edge AI: The 50 KB Runtime
|
|
|
|
ML inference frameworks tend to be big. PyTorch is ~500 MB. TensorFlow Lite
|
|
is ~1 MB for the runtime alone. ONNX Runtime is ~10 MB.
|
|
|
|
WAFER is ~50 KB for the full Forth system. The model weights dominate the
|
|
binary size, not the runtime. For edge devices — IoT sensors, wearables,
|
|
microcontrollers, browser tabs — that size difference can be the difference
|
|
between "fits" and "doesn't fit."
|
|
|
|
WASM's portability means the same inference code runs on an ARM
|
|
microcontroller, in a browser, on a server, without recompilation. Write the
|
|
model once in Forth, deploy everywhere WASM reaches.
|
|
|
|
### DOES> for Architecture Generation
|
|
|
|
Forth's `DOES>` is a metaprogramming facility: it creates words that create
|
|
other words, each with custom runtime behavior. This is exactly what neural
|
|
architecture construction needs:
|
|
|
|
```forth
|
|
\ LAYER is a defining word that creates layer words
|
|
: LAYER ( weights bias rows cols -- )
|
|
CREATE , , , , \ store dimensions and pointers
|
|
DOES> ( tensor -- tensor )
|
|
DUP >R \ save parameter field address
|
|
R@ @ R@ 4 + @ \ get cols, rows
|
|
R@ 8 + @ \ get weights address
|
|
T-MATMUL
|
|
R> 12 + @ \ get bias address
|
|
T-ADD
|
|
;
|
|
|
|
\ Define the network architecture
|
|
w1 b1 768 512 LAYER EMBED
|
|
w2 b2 512 256 LAYER HIDDEN1
|
|
w3 b3 256 10 LAYER OUTPUT
|
|
|
|
\ The architecture is now executable
|
|
: MODEL ( tensor -- tensor ) EMBED T-RELU HIDDEN1 T-RELU OUTPUT T-SOFTMAX ;
|
|
```
|
|
|
|
Each `LAYER` invocation creates a new word with its own weights and
|
|
dimensions baked in. The `MODEL` word composes them. This is the same
|
|
pattern as `nn.Sequential` in PyTorch — but it compiles to WASM, has zero
|
|
framework overhead, and the "architecture definition" and the "executable
|
|
model" are the same thing.
|
|
|
|
### Automatic Differentiation via Dual Numbers
|
|
|
|
Backpropagation is reverse-mode automatic differentiation. There is an
|
|
elegant formulation using dual numbers (a value paired with its derivative)
|
|
that maps to Forth's double-cell operations:
|
|
|
|
```forth
|
|
\ A dual number is a pair ( value derivative ) stored as a double cell
|
|
\ WAFER's double-cell words (D+, D-, D*, 2DUP, etc.) operate on these natively
|
|
|
|
\ Dual addition: (a, a') + (b, b') = (a+b, a'+b')
|
|
: D+DUAL ( a a' b b' -- a+b a'+b' )
|
|
ROT + \ a' + b'
|
|
>R + R> \ a + b, then restore derivative
|
|
;
|
|
|
|
\ Dual multiplication: (a, a') * (b, b') = (a*b, a*b' + a'*b)
|
|
: D*DUAL ( a a' b b' -- a*b a*b'+a'*b )
|
|
3 PICK * \ a * b'
|
|
>R
|
|
ROT * \ a' * b
|
|
R> + \ a*b' + a'*b = derivative
|
|
>R
|
|
* \ a * b = value
|
|
R>
|
|
;
|
|
```
|
|
|
|
The chain rule emerges naturally: composing dual-number operations through a
|
|
sequence of words automatically computes the derivative of the whole
|
|
pipeline. This is the same principle behind JAX's `jvp` — but expressed as
|
|
stack operations.
|
|
|
|
### A Different Way to Look at It
|
|
|
|
Most of the ML ecosystem's complexity lives in _training_. Inference, by
|
|
comparison, is fairly straightforward: load weights, multiply matrices, apply
|
|
activations, read output. That's a pipeline of arithmetic operations — which
|
|
is pretty much what Forth was designed for. The industry tends to wrap
|
|
inference in 500 MB frameworks because training needed those frameworks, and
|
|
the two haven't been fully separated. A 50 KB Forth runtime doing quantized
|
|
integer operations might be closer to what inference actually needs than we
|
|
usually assume.
|
|
|
|
---
|
|
|
|
## 4. AI Generating Code: The Smallest Target Language
|
|
|
|
### The Token Economy
|
|
|
|
When an LLM generates code, every token costs money and adds latency. A
|
|
Python solution to "compute the average of a list" looks like:
|
|
|
|
```python
|
|
def average(numbers):
|
|
if not numbers:
|
|
return 0
|
|
return sum(numbers) / len(numbers)
|
|
```
|
|
|
|
That is 25 tokens. The Forth equivalent:
|
|
|
|
```forth
|
|
: AVERAGE ( addr n -- avg ) 2DUP SUM -ROT NIP / ;
|
|
```
|
|
|
|
That is 12 tokens. For the same semantic content, Forth uses roughly half
|
|
the tokens. At scale — millions of API calls, each generating hundreds of
|
|
lines — this is a meaningful cost reduction. But the token savings are the
|
|
least interesting advantage.
|
|
|
|
### Minimal Syntax, Maximal Verifiability
|
|
|
|
Forth has essentially no syntax. There are words separated by spaces. There
|
|
are numbers. There are a few special constructs (`:` for definitions, `IF`
|
|
/`THEN` for conditionals, `DO`/`LOOP` for iteration). That's about it.
|
|
|
|
An LLM generating Python must get indentation right, match parentheses and
|
|
brackets, handle keyword arguments, manage import statements, respect method
|
|
resolution order, and navigate a standard library of thousands of functions.
|
|
An LLM generating Forth mostly just needs to get the stack effect right.
|
|
That's the main failure mode worth worrying about.
|
|
|
|
And stack effects are **mechanically verifiable**:
|
|
|
|
```forth
|
|
\ Stack effect: ( n1 n2 -- n3 )
|
|
\ Verification: start with 2 items on stack, end with 1
|
|
: ADD-AND-DOUBLE ( n1 n2 -- n3 ) + 2* ;
|
|
|
|
\ Test:
|
|
3 4 ADD-AND-DOUBLE \ stack should contain: 14
|
|
```
|
|
|
|
You don't need a type checker or static analysis. Just run the word with
|
|
known inputs and check the stack. If the stack depth and values match the
|
|
declared effect, the word is correct. It's hard to think of another practical
|
|
language where verification is this straightforward.
|
|
|
|
### Self-Extending Vocabulary
|
|
|
|
LLMs struggle with large codebases because context windows are finite. A
|
|
Python project with 50 files and 10,000 lines requires the LLM to hold (or
|
|
retrieve) vast amounts of context to generate correct code.
|
|
|
|
Forth's defining characteristic is that you build the language up to your
|
|
problem. The LLM doesn't need to generate a 100-line solution. It generates
|
|
5-line words, each building on the previous ones:
|
|
|
|
```forth
|
|
\ Step 1: LLM generates basic operations
|
|
: CLAMP ( n lo hi -- n' ) ROT MIN MAX ;
|
|
: BETWEEN? ( n lo hi -- flag ) OVER - >R - R> U< ;
|
|
|
|
\ Step 2: LLM generates higher-level operations using step 1
|
|
: NORMALIZE ( n -- n' ) 0 255 CLAMP ;
|
|
: IN-RANGE? ( n -- flag ) 0 100 BETWEEN? ;
|
|
|
|
\ Step 3: LLM generates application logic using steps 1-2
|
|
: PROCESS-SENSOR ( raw -- calibrated )
|
|
offset @ - \ remove sensor offset
|
|
NORMALIZE \ clamp to valid range
|
|
scale @ * 1000 / \ apply calibration scale
|
|
;
|
|
```
|
|
|
|
Each step requires only the _names_ of previously defined words, not their
|
|
implementations. The dictionary serves as a compressed representation of the
|
|
entire program. An LLM can generate correct code by knowing only the word
|
|
names and their stack effects — a few dozen tokens of context instead of
|
|
thousands of lines.
|
|
|
|
### WASM Sandbox: Safe Execution of Untrusted Code
|
|
|
|
AI-generated code generally needs to be executed to be verified. Running
|
|
arbitrary Python is tricky from a security perspective — file system access,
|
|
network calls, `import os`, `eval()`. Sandboxing Python typically requires
|
|
containerization, seccomp filters, or virtual machines.
|
|
|
|
WAFER compiles to WASM, which executes in a sandbox by construction. A
|
|
WAFER program:
|
|
|
|
- Cannot access the file system
|
|
- Cannot make network calls
|
|
- Cannot read memory outside its linear memory
|
|
- Cannot execute longer than the host allows (fuel metering)
|
|
- Cannot consume more memory than the host allocates
|
|
|
|
You can run AI-generated Forth with roughly the same confidence as a pure
|
|
mathematical function. The sandbox isn't a bolt-on — it's just how WASM
|
|
works.
|
|
|
|
```forth
|
|
\ AI generates this code. Is it safe to run? Yes, always.
|
|
: FIBONACCI ( n -- fib )
|
|
DUP 2 < IF EXIT THEN
|
|
DUP 1- RECURSE
|
|
SWAP 2 - RECURSE
|
|
+
|
|
;
|
|
```
|
|
|
|
There's nothing this word can do except compute. No side effects, no
|
|
escape hatches. The WASM sandbox guarantees that structurally.
|
|
|
|
### A Different Way to Look at It
|
|
|
|
The conventional wisdom is that LLMs need expressive, high-level languages
|
|
to generate useful code. But there's a good case for the opposite: what LLMs
|
|
really benefit from are **verifiable** languages — ones where correctness can
|
|
be checked cheaply and deterministically. Expressiveness can actually work
|
|
against you here: more syntax means more ways to be wrong, more edge cases
|
|
to handle, more context to maintain. Forth's extreme minimalism starts to
|
|
look less like a limitation and more like an advantage: generate a few small
|
|
words, verify each one by running it, compose them into larger programs with
|
|
confidence. The language that's hardest for humans to read might just be the
|
|
easiest for machines to write correctly.
|
|
|
|
---
|
|
|
|
## 5. AI Agent Control: Plans That Execute Themselves
|
|
|
|
### The Plan-Program Gap
|
|
|
|
When an AI agent "plans," it produces a sequence of steps in natural
|
|
language:
|
|
|
|
> 1. Search for files matching "*.config"
|
|
> 2. Read each file and extract the "timeout" field
|
|
> 3. If timeout > 30, update it to 30
|
|
> 4. Write the modified files back
|
|
|
|
This plan is then "executed" by the agent interpreting each step, calling
|
|
tools, handling errors, and managing state — all mediated by the LLM at
|
|
every step, consuming tokens and latency for what is fundamentally a
|
|
sequential program.
|
|
|
|
The gap between "plan" and "program" might be more artificial than it looks.
|
|
A plan _is_ a program — we just don't usually give agents a good executable
|
|
representation for it.
|
|
|
|
Forth could be that representation.
|
|
|
|
### Tools as Words
|
|
|
|
Every agent tool — file read, web search, code execution, API call — maps
|
|
to a Forth word. The agent's toolkit becomes a Forth dictionary:
|
|
|
|
```forth
|
|
\ Agent tool vocabulary (host functions)
|
|
\ SEARCH-FILES ( pattern-addr pattern-len -- results-addr count )
|
|
\ READ-FILE ( path-addr path-len -- content-addr content-len )
|
|
\ WRITE-FILE ( content-addr content-len path-addr path-len -- )
|
|
\ JSON-GET ( json-addr key-addr key-len -- value-addr value-len )
|
|
\ SHELL ( cmd-addr cmd-len -- output-addr output-len )
|
|
\ ASK-USER ( question-addr question-len -- answer-addr answer-len )
|
|
```
|
|
|
|
Now the plan from above becomes an executable program:
|
|
|
|
```forth
|
|
: UPDATE-TIMEOUTS ( -- )
|
|
S" *.config" SEARCH-FILES \ get matching files
|
|
0 DO \ for each file
|
|
DUP I CELLS + @ COUNT \ get filename
|
|
2DUP READ-FILE \ read contents
|
|
S" timeout" JSON-GET \ extract timeout field
|
|
S>NUMBER DROP \ convert to number
|
|
30 > IF \ if timeout > 30
|
|
30 SET-TIMEOUT \ update to 30
|
|
WRITE-FILE \ write back
|
|
ELSE
|
|
2DROP \ discard unchanged
|
|
THEN
|
|
LOOP
|
|
DROP
|
|
;
|
|
|
|
UPDATE-TIMEOUTS
|
|
```
|
|
|
|
This isn't a description of what to do — it _is_ what to do. The agent
|
|
generates it, WAFER compiles it to WASM, and it runs — no LLM in the loop
|
|
during execution, no token cost per step, no latency per tool call.
|
|
|
|
### Error Handling with CATCH/THROW
|
|
|
|
Of course, agent plans fail. Files don't exist. APIs return errors.
|
|
Permissions get denied. Production agent systems need robust error handling,
|
|
which typically means calling the LLM at every step to decide what to do
|
|
when something goes wrong.
|
|
|
|
WAFER has `CATCH` and `THROW` — structured exception handling that lets
|
|
the plan itself define error recovery:
|
|
|
|
```forth
|
|
: SAFE-READ ( path-addr path-len -- content-addr content-len | 0 0 )
|
|
['] READ-FILE CATCH IF
|
|
2DROP 0 0 \ file not found: return empty
|
|
THEN
|
|
;
|
|
|
|
: SAFE-UPDATE ( filename-addr filename-len -- )
|
|
2DUP SAFE-READ \ try to read
|
|
DUP 0= IF 2DROP 2DROP EXIT THEN \ skip if file missing
|
|
S" timeout" JSON-GET
|
|
S>NUMBER DROP
|
|
30 > IF
|
|
30 SET-TIMEOUT
|
|
WRITE-FILE
|
|
ELSE
|
|
2DROP 2DROP
|
|
THEN
|
|
;
|
|
|
|
: ROBUST-UPDATE-TIMEOUTS ( -- )
|
|
S" *.config" SEARCH-FILES
|
|
0 DO
|
|
DUP I CELLS + @ COUNT SAFE-UPDATE
|
|
LOOP
|
|
DROP
|
|
;
|
|
```
|
|
|
|
The error handling is part of the plan. The agent generates it once, and it
|
|
runs to completion without further LLM intervention. Errors are handled at
|
|
the speed of WASM, not the speed of an API call to an LLM.
|
|
|
|
### The Dictionary as Growing Capability
|
|
|
|
A human Forth programmer builds up vocabulary: small words compose into
|
|
larger words, which compose into still larger words. The dictionary grows
|
|
with the programmer's understanding of the problem.
|
|
|
|
An AI agent does the same thing. Each successfully executed plan leaves
|
|
behind defined words that can be reused:
|
|
|
|
```forth
|
|
\ First task: agent learns to read configs
|
|
: READ-CONFIG ( path-addr path-len -- json-addr json-len )
|
|
SAFE-READ DUP 0= IF EXIT THEN JSON-PARSE ;
|
|
|
|
\ Second task: agent learns to update configs
|
|
: UPDATE-CONFIG ( key-addr key-len value path-addr path-len -- )
|
|
2DUP READ-CONFIG JSON-SET WRITE-FILE ;
|
|
|
|
\ Third task: agent composes previous capabilities
|
|
: MIGRATE-CONFIGS ( -- )
|
|
S" *.config" SEARCH-FILES
|
|
0 DO
|
|
DUP I CELLS + @ COUNT
|
|
S" timeout" 30 ROT ROT UPDATE-CONFIG
|
|
LOOP DROP
|
|
;
|
|
|
|
\ The agent's vocabulary grows with experience.
|
|
\ MIGRATE-CONFIGS didn't exist before. Now it does.
|
|
\ Next time, the agent can use it as a building block.
|
|
```
|
|
|
|
You could call this _learned tool use_ — not in the machine learning sense,
|
|
but in the software engineering sense. The agent defines new capabilities in
|
|
terms of old ones, and the dictionary persists across invocations. Over time,
|
|
the agent's vocabulary naturally converges on the abstractions that matter
|
|
for its operational domain.
|
|
|
|
### REPL as Test-Before-Commit
|
|
|
|
Agents that act irreversibly on the first try are risky. WAFER's REPL model
|
|
gives agents a natural test-before-commit workflow:
|
|
|
|
1. **Define**: Generate and compile the plan as Forth words.
|
|
2. **Test**: Run the words against sample data on the stack.
|
|
3. **Verify**: Check the stack for expected results.
|
|
4. **Execute**: Run the plan for real only after verification passes.
|
|
|
|
```forth
|
|
\ Step 1: Define
|
|
: CALCULATE-DISCOUNT ( price tier -- discounted )
|
|
CASE
|
|
1 OF 10 ENDOF \ tier 1: 10% off
|
|
2 OF 20 ENDOF \ tier 2: 20% off
|
|
3 OF 35 ENDOF \ tier 3: 35% off
|
|
0 SWAP
|
|
ENDCASE
|
|
100 SWAP - * 100 /
|
|
;
|
|
|
|
\ Step 2: Test (no side effects, just stack operations)
|
|
1000 1 CALCULATE-DISCOUNT . \ expect 900
|
|
1000 2 CALCULATE-DISCOUNT . \ expect 800
|
|
1000 3 CALCULATE-DISCOUNT . \ expect 650
|
|
|
|
\ Step 3: Verify output matches expectations
|
|
\ Step 4: Apply to real data only after tests pass
|
|
```
|
|
|
|
The agent can generate, test, and iterate without ever touching production
|
|
data. The REPL isn't just a debugging convenience here — it's a safety mechanism
|
|
for autonomous agents.
|
|
|
|
### Multi-Agent Coordination
|
|
|
|
Multiple agents can share a WAFER dictionary through shared linear memory.
|
|
One agent defines words. Another agent uses them. A coordinator agent
|
|
composes them into higher-level plans:
|
|
|
|
```forth
|
|
\ Agent A defines data retrieval
|
|
: FETCH-METRICS ( -- addr n ) metrics-api QUERY PARSE-JSON ;
|
|
|
|
\ Agent B defines analysis
|
|
: DETECT-ANOMALIES ( addr n -- anomalies-addr n )
|
|
THRESHOLD @ FILTER-ABOVE ;
|
|
|
|
\ Agent C defines actions
|
|
: ALERT ( anomalies-addr n -- )
|
|
0 DO DUP I CELLS + @ SEND-ALERT LOOP DROP ;
|
|
|
|
\ Coordinator composes them
|
|
: MONITOR ( -- )
|
|
BEGIN
|
|
FETCH-METRICS DETECT-ANOMALIES
|
|
DUP 0> IF ALERT ELSE DROP THEN
|
|
60000 DELAY
|
|
AGAIN
|
|
;
|
|
```
|
|
|
|
Each agent contributes words to a shared vocabulary. The coordinator doesn't
|
|
need to understand the implementation of `FETCH-METRICS` or
|
|
`DETECT-ANOMALIES` — it only needs to know their stack effects. This is
|
|
composability without coupling, coordination without shared state beyond
|
|
the dictionary.
|
|
|
|
### A Different Way to Look at It
|
|
|
|
The AI agent community is building increasingly sophisticated "plan
|
|
representations" — DAGs, state machines, behavior trees, ReAct loops — all
|
|
trying to bridge the gap between the LLM's natural language output and
|
|
actual tool execution. But Forth is already a plan representation that
|
|
doubles as an execution engine. It has structured control flow (`IF`/`THEN`,
|
|
`DO`/`LOOP`, `BEGIN`/`UNTIL`), error handling (`CATCH`/`THROW`),
|
|
composability (word definitions), and a test harness (the REPL and stack).
|
|
Maybe the gap between "plan" and "program" doesn't need to be bridged so
|
|
much as it needs to be _erased_.
|
|
|
|
---
|
|
|
|
## Convergence: Five Problems, One Shape
|
|
|
|
These five domains look different on the surface:
|
|
|
|
| Domain | Traditional Tool | Core Operation |
|
|
| --------------- | ------------------------------ | -------------------- |
|
|
| Data analytics | Pandas, Spark | Transform pipeline |
|
|
| Database engine | SQLite VDBE, Postgres executor | Query plan execution |
|
|
| AI inference | PyTorch, TensorFlow | Layer composition |
|
|
| AI codegen | Python, JavaScript | Program synthesis |
|
|
| AI agents | LangChain, CrewAI | Plan execution |
|
|
|
|
But they share a deep structure: **sequential composition of simple
|
|
operations on a data flow**. A data pipeline, a query plan, a forward
|
|
pass, a synthesized program, and an agent plan are all the same thing:
|
|
a sequence of words applied to a stack.
|
|
|
|
Forth noticed this in 1970. Charles Moore designed a language around the
|
|
observation that most computation is a pipeline of transformations, and
|
|
the simplest way to express pipelines is sequential composition on a
|
|
stack. The language has no syntax because pipelines don't need syntax.
|
|
It has no type system because the data flow _is_ the type. It has no
|
|
package manager because each program builds its own vocabulary from
|
|
primitives.
|
|
|
|
WAFER brings these ideas to the modern world by targeting WebAssembly — the
|
|
universal runtime that runs in browsers, on servers, on edge devices, in
|
|
sandboxes. That combination opens up some interesting possibilities:
|
|
|
|
- **Analytics in the browser** with no server, no framework, deterministic
|
|
execution.
|
|
- **Database VMs** that compile queries to native WASM through an existing
|
|
Forth JIT.
|
|
- **Inference engines** that fit in 50 KB and run on any device WASM
|
|
reaches.
|
|
- **AI-generated code** in the language with the smallest syntax, cheapest
|
|
verification, and safest sandbox.
|
|
- **Agent plans** that are executable programs, testable in a REPL,
|
|
composable through a growing dictionary.
|
|
|
|
None of this requires Forth to change. Forth has been this shape for 55
|
|
years. It's kind of fun that the world's problems seem to be circling back
|
|
to it.
|
|
|
|
---
|
|
|
|
_WAFER is open source. Start at the [repository root](../README.md)._
|
|
_Architecture details: [WAFER.md](WAFER.md). Language introduction:
|
|
[FORTH.md](FORTH.md)._
|