Implement complete Floating-Point word set, 70+ float words
Separate float stack with fsp global, IEEE 754 double precision. Stack ops: FDROP FDUP FSWAP FOVER FROT FDEPTH Arithmetic: F+ F- F* F/ FNEGATE FABS FMAX FMIN FSQRT FLOOR FROUND F** Comparisons: F0= F0< F= F< F~ Memory: F@ F! SF@ SF! DF@ DF! FLOAT+ FLOATS FALIGNED FALIGN Conversions: D>F F>D S>F F>S Trig: FSIN FCOS FTAN FASIN FACOS FATAN FATAN2 FSINCOS Exp/Log: FEXP FEXPM1 FLN FLNP1 FLOG FALOG Hyperbolic: FSINH FCOSH FTANH FASINH FACOSH FATANH I/O: F. FE. FS. REPRESENT >FLOAT PRECISION SET-PRECISION Defining: FVARIABLE FCONSTANT FVALUE FLITERAL Float literal parsing (1E, 1.5E2, -3.14E0 format) 299 unit tests + 11 compliance tests, 0 errors on float test suite
This commit is contained in:
@@ -29,6 +29,10 @@ const DSP: u32 = 0;
|
||||
/// Index of the `$rsp` global (return stack pointer).
|
||||
const RSP: u32 = 1;
|
||||
|
||||
/// Index of the `$fsp` global (float stack pointer).
|
||||
#[allow(dead_code)]
|
||||
const FSP: u32 = 2;
|
||||
|
||||
/// Index of the imported function table.
|
||||
const TABLE: u32 = 0;
|
||||
|
||||
@@ -795,6 +799,15 @@ pub fn compile_word(
|
||||
shared: false,
|
||||
}),
|
||||
);
|
||||
imports.import(
|
||||
"env",
|
||||
"fsp",
|
||||
EntityType::Global(GlobalType {
|
||||
val_type: ValType::I32,
|
||||
mutable: true,
|
||||
shared: false,
|
||||
}),
|
||||
);
|
||||
imports.import(
|
||||
"env",
|
||||
"table",
|
||||
@@ -871,7 +884,7 @@ mod tests {
|
||||
use super::*;
|
||||
use crate::dictionary::WordId;
|
||||
use crate::ir::IrOp;
|
||||
use crate::memory::{DATA_STACK_TOP, RETURN_STACK_TOP};
|
||||
use crate::memory::{DATA_STACK_TOP, FLOAT_STACK_TOP, RETURN_STACK_TOP};
|
||||
|
||||
fn default_config() -> CodegenConfig {
|
||||
CodegenConfig {
|
||||
@@ -1133,6 +1146,13 @@ mod tests {
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let fsp = Global::new(
|
||||
&mut store,
|
||||
wasmtime::GlobalType::new(ValType::I32, Mutability::Var),
|
||||
Val::I32(FLOAT_STACK_TOP as i32),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let table = Table::new(
|
||||
&mut store,
|
||||
wasmtime::TableType::new(RefType::FUNCREF, 16, None),
|
||||
@@ -1152,6 +1172,7 @@ mod tests {
|
||||
memory.into(),
|
||||
dsp.into(),
|
||||
rsp.into(),
|
||||
fsp.into(),
|
||||
table.into(),
|
||||
],
|
||||
)
|
||||
|
||||
@@ -111,6 +111,12 @@ impl Dictionary {
|
||||
Ok(WordId(fn_index))
|
||||
}
|
||||
|
||||
/// Reserve a function index without creating a dictionary entry.
|
||||
/// Used for anonymous host functions (e.g., float literals during compilation).
|
||||
pub fn reserve_fn_index(&mut self) {
|
||||
self.next_fn_index += 1;
|
||||
}
|
||||
|
||||
/// Reveal the most recent word (remove HIDDEN flag).
|
||||
/// Called after `: ... ;` completes compilation.
|
||||
pub fn reveal(&mut self) {
|
||||
|
||||
+1952
-19
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,890 @@
|
||||
# The Unreasonable Effectiveness of Stack Machines
|
||||
|
||||
_How Forth — and WAFER — can serve as infrastructure for data analytics,
|
||||
databases, AI inference, AI code generation, and AI agent control._
|
||||
|
||||
---
|
||||
|
||||
Forth is 55 years old. It has no type system, no garbage collector, no package
|
||||
manager, no syntax to speak of. By most conventional measures, it shouldn't
|
||||
still be relevant.
|
||||
|
||||
But it keeps showing up at the edges — in firmware, in space probes, in
|
||||
real-time systems, in places where correctness and determinism matter more than
|
||||
developer ergonomics. That's worth paying attention to.
|
||||
|
||||
The properties that make Forth unusual — concatenative composition, zero-cost
|
||||
abstraction through word definition, a stack-based execution model that maps
|
||||
directly to hardware — happen to line up surprisingly well with what five of
|
||||
the most active areas in modern computing are independently reaching for:
|
||||
|
||||
1. **Data analytics** wants composable, streaming pipelines.
|
||||
2. **Database engines** want stack-based virtual machines for query execution.
|
||||
3. **AI inference** wants tiny, deterministic, embeddable runtimes.
|
||||
4. **AI code generation** wants the smallest possible target language.
|
||||
5. **AI agent systems** want plans that are also executable programs.
|
||||
|
||||
Forth won't single-handedly solve any of these. But it offers a useful lens
|
||||
for understanding what each of them actually needs — and WAFER, a Forth that
|
||||
compiles to WebAssembly, is in a good position to explore that space.
|
||||
|
||||
WAFER (WebAssembly Forth Engine in Rust) JIT-compiles each Forth word to its
|
||||
own WASM module, linked through shared linear memory, globals, and a function
|
||||
table. It runs anywhere WASM runs: browsers, edge devices, servers, embedded
|
||||
systems. It has 160+ words, 100% Forth 2012 compliance on 10 word sets, and
|
||||
fits in ~50 KB. It has exception handling (`CATCH`/`THROW`), metaprogramming
|
||||
(`DOES>`), dynamic compilation (`EVALUATE`), and an optimization pipeline
|
||||
designed for stack-to-local promotion that can achieve 7x speedups.
|
||||
|
||||
This document explores what becomes possible when you take these properties
|
||||
seriously.
|
||||
|
||||
---
|
||||
|
||||
## 1. Data Analytics: Pipelines Without Plumbing
|
||||
|
||||
### The Problem with Pipelines
|
||||
|
||||
Every data analytics framework reinvents the same idea: take data, push it
|
||||
through a sequence of transformations, collect the result. Pandas chains
|
||||
methods. Spark builds DAGs. dplyr pipes with `%>%`. Unix pipes bytes through
|
||||
`|`. They all converge on the same shape: **linear composition of operations
|
||||
on an implicit data flow**.
|
||||
|
||||
This is exactly what Forth does. It has done it since 1970. The data stack
|
||||
_is_ the pipeline. Each word _is_ a transformation. Composition is
|
||||
juxtaposition — you don't pipe, you don't chain, you don't bind. You just
|
||||
write the words next to each other.
|
||||
|
||||
```forth
|
||||
\ Pandas: df['amount'].where(df['amount'] > 0).mean()
|
||||
\ Forth:
|
||||
: POSITIVE? ( n -- n flag ) DUP 0> ;
|
||||
: FILTER-POSITIVE ( addr n -- addr' n' )
|
||||
0 >R 0 >R \ count and sum accumulators on return stack
|
||||
0 DO
|
||||
DUP I CELLS + @
|
||||
POSITIVE? IF R> + >R R> 1+ >R THEN
|
||||
LOOP DROP
|
||||
R> R> \ ( sum count )
|
||||
;
|
||||
: MEAN ( sum count -- avg ) / ;
|
||||
|
||||
data 100 FILTER-POSITIVE MEAN .
|
||||
```
|
||||
|
||||
This goes a bit deeper than syntactic sugar. The absence of intermediate
|
||||
variables is a structural property. In a Pandas chain, every `.method()`
|
||||
returns a new DataFrame object that must be allocated, tracked, and eventually
|
||||
collected. In Forth, the data flows through the stack with zero allocation.
|
||||
The pipeline _is_ the execution.
|
||||
|
||||
### Streaming and Incremental Computation
|
||||
|
||||
The stack model is inherently streaming. A word consumes its inputs and
|
||||
produces its outputs in the same motion. There is no "collect all data first,
|
||||
then process" step unless you explicitly build one. This makes Forth natural
|
||||
for:
|
||||
|
||||
- **Event stream processing**: each event lands on the stack, a word
|
||||
processes it, the result is consumed by the next word.
|
||||
- **Incremental aggregation**: running sums, counts, and statistics
|
||||
maintained on the return stack across invocations.
|
||||
- **Windowed computation**: a circular buffer in linear memory with
|
||||
stack-based access patterns.
|
||||
|
||||
```forth
|
||||
\ Running average over a stream of values
|
||||
VARIABLE running-sum
|
||||
VARIABLE running-count
|
||||
|
||||
: UPDATE-AVG ( new-value -- running-avg )
|
||||
running-sum @ + DUP running-sum !
|
||||
running-count @ 1+ DUP running-count !
|
||||
/
|
||||
;
|
||||
|
||||
\ Each incoming value:
|
||||
42 UPDATE-AVG . \ prints running average after adding 42
|
||||
17 UPDATE-AVG . \ prints updated average after adding 17
|
||||
```
|
||||
|
||||
### Client-Side Analytics via WASM
|
||||
|
||||
WAFER compiles to WebAssembly. This means analytics can run _in the browser_
|
||||
with no server round-trips. A user uploads a CSV, WAFER parses and processes
|
||||
it entirely client-side, and the results render immediately. No data leaves
|
||||
the machine. No API calls. No latency.
|
||||
|
||||
This isn't just a nice demo. For privacy-sensitive analytics (healthcare,
|
||||
finance, GDPR-regulated data), client-side processing can be a compliance
|
||||
requirement, not just a nice-to-have. WAFER's deterministic execution (no GC
|
||||
pauses, no background threads, fixed memory layout) makes it predictable
|
||||
enough for real-time dashboards.
|
||||
|
||||
### Domain-Specific Languages
|
||||
|
||||
Forth's defining feature is that you build the language up to your problem.
|
||||
An analytics team doesn't write Forth — they write _their DSL_, which
|
||||
happens to be implemented in Forth:
|
||||
|
||||
```forth
|
||||
\ Define a mini analytics vocabulary
|
||||
: COLUMN ( col# -- addr n ) table-base SWAP col-offset + col-length ;
|
||||
: SUM ( addr n -- total ) 0 ROT ROT 0 DO OVER I CELLS + @ + LOOP NIP ;
|
||||
: COUNT ( addr n -- n ) NIP ;
|
||||
: AVG ( addr n -- avg ) 2DUP SUM -ROT COUNT / ;
|
||||
: WHERE> ( addr n thresh -- addr' n' ) filter-gt ;
|
||||
|
||||
\ The analyst writes:
|
||||
3 COLUMN 1000 WHERE> AVG .
|
||||
\ "Average of column 3 where values exceed 1000"
|
||||
```
|
||||
|
||||
The DSL compiles to WASM through WAFER's IR pipeline. There is no
|
||||
interpreter overhead at query time. The analyst's vocabulary _is_ the
|
||||
optimized code.
|
||||
|
||||
### A Different Way to Look at It
|
||||
|
||||
Most languages treat the absence of named variables as a limitation. But in
|
||||
data pipelines, it can actually be a **feature**. Named intermediates create
|
||||
coupling points — places where code can refer to stale state, where
|
||||
refactoring requires renaming, where parallelization requires dependency
|
||||
analysis. Point-free composition through a stack sidesteps this whole class
|
||||
of problems. The data is always _here_, on top of the stack, ready for the
|
||||
next transformation.
|
||||
|
||||
---
|
||||
|
||||
## 2. Database Engine: The Query VM You Already Have
|
||||
|
||||
### Databases Already Think in Stacks
|
||||
|
||||
SQLite — the most deployed database engine in the world — executes queries
|
||||
through the VDBE (Virtual Database Engine), a stack-based bytecode virtual
|
||||
machine. When you write `SELECT * FROM users WHERE age > 30`, SQLite's query
|
||||
planner compiles it into a sequence of stack operations: open cursor, seek,
|
||||
compare, jump, emit row.
|
||||
|
||||
PostgreSQL's executor runs a tree of plan nodes, each of which pushes tuples
|
||||
upward. MySQL's handler interface is a stack of operations. CockroachDB
|
||||
compiles SQL to a vectorized execution engine that operates on batches — but
|
||||
the control flow is still a stack of operators.
|
||||
|
||||
There's a pattern here: **query execution engines tend to converge on
|
||||
stack machines**. Forth just happens to already be one, with no extra
|
||||
abstraction layers in between.
|
||||
|
||||
### Query Plans as Forth Programs
|
||||
|
||||
A SQL query plan is a tree. Flattened into execution order, it becomes a
|
||||
sequence of operations — which is exactly a Forth program:
|
||||
|
||||
```sql
|
||||
SELECT name, salary FROM employees WHERE dept = 'ENG' AND salary > 100000;
|
||||
```
|
||||
|
||||
The query plan, expressed as Forth:
|
||||
|
||||
```forth
|
||||
\ Primitives provided by the storage engine
|
||||
\ SCAN ( table -- cursor )
|
||||
\ NEXT-ROW ( cursor -- cursor flag ) flag=true if row available
|
||||
\ COL@ ( cursor col# -- value )
|
||||
\ EMIT-ROW ( v1 v2 -- ) send to result set
|
||||
\ CLOSE ( cursor -- )
|
||||
|
||||
: MATCH-DEPT? ( cursor -- cursor flag ) DUP 2 COL@ S" ENG" COMPARE 0= ;
|
||||
: MATCH-SAL? ( cursor -- cursor flag ) DUP 3 COL@ 100000 > ;
|
||||
: PROJECT ( cursor -- ) DUP 0 COL@ OVER 3 COL@ EMIT-ROW ;
|
||||
|
||||
: QUERY ( -- )
|
||||
employees SCAN
|
||||
BEGIN
|
||||
NEXT-ROW
|
||||
WHILE
|
||||
MATCH-DEPT? IF
|
||||
MATCH-SAL? IF
|
||||
PROJECT
|
||||
THEN
|
||||
THEN
|
||||
REPEAT
|
||||
CLOSE
|
||||
;
|
||||
```
|
||||
|
||||
This isn't just pseudocode, either. Every word here could be a real WAFER
|
||||
word backed by storage primitives implemented as host functions. The query
|
||||
compiles through WAFER's IR pipeline to native WASM, with the same
|
||||
optimization opportunities as any other Forth word: inlining, constant
|
||||
folding, dead code elimination.
|
||||
|
||||
### EVALUATE as Dynamic Query Compilation
|
||||
|
||||
SQL databases accept queries as strings and compile them at runtime. Forth
|
||||
has `EVALUATE`, which does exactly the same thing — takes a string and
|
||||
compiles/executes it:
|
||||
|
||||
```forth
|
||||
\ Build a query string dynamically
|
||||
S" employees SCAN BEGIN NEXT-ROW WHILE MATCH-DEPT? IF PROJECT THEN REPEAT CLOSE"
|
||||
EVALUATE
|
||||
```
|
||||
|
||||
The difference from SQL: the "query language" and the "implementation
|
||||
language" are the same. There is no impedance mismatch between the language
|
||||
the user writes queries in and the language the engine executes them in. A
|
||||
user-defined function is just another word. An index lookup is just another
|
||||
word. A join strategy is just another word. They all compose the same way.
|
||||
|
||||
### Linear Memory as Storage Pages
|
||||
|
||||
WAFER's linear memory model maps directly to how databases manage storage.
|
||||
A database page is a fixed-size block of bytes at a known offset — exactly
|
||||
what Forth's `@` and `!` operate on. B-tree nodes are structures in linear
|
||||
memory traversed by pointer arithmetic:
|
||||
|
||||
```forth
|
||||
\ B-tree node layout:
|
||||
\ +0: key count (cell)
|
||||
\ +4: is-leaf flag (cell)
|
||||
\ +8: keys array (key-count cells)
|
||||
\ +8+4*key-count: child pointers (key-count+1 cells)
|
||||
|
||||
: NODE-KEYS ( node -- addr ) 8 + ;
|
||||
: NODE-KEY@ ( node i -- key ) CELLS SWAP NODE-KEYS + @ ;
|
||||
: NODE-CHILD@ ( node i -- child )
|
||||
OVER NODE-KEYS
|
||||
OVER @ CELLS + \ skip past keys array
|
||||
SWAP CELLS + 4 + \ index into children
|
||||
@
|
||||
;
|
||||
|
||||
: BTREE-SEARCH ( node target-key -- addr|0 )
|
||||
OVER @ 0= IF 2DROP 0 EXIT THEN \ empty node
|
||||
OVER 4 + @ IF \ leaf node
|
||||
LEAF-SEARCH
|
||||
ELSE
|
||||
INTERNAL-SEARCH \ recurse into child
|
||||
THEN
|
||||
;
|
||||
```
|
||||
|
||||
### WASM Sandboxing for User-Defined Functions
|
||||
|
||||
Safely executing user-defined functions (UDFs) is one of the trickier
|
||||
problems in database engines. PostgreSQL UDFs in C can crash the server.
|
||||
JavaScript UDFs require embedding V8. Python UDFs tend to be slow.
|
||||
|
||||
WAFER UDFs compile to WASM and execute in a sandbox with bounded memory,
|
||||
bounded execution time, and no access to anything outside the linear memory
|
||||
they're given. A malicious UDF can't read other users' data, can't make
|
||||
network calls, can't crash the host. WAFER gets this for free — it's
|
||||
inherent to WASM's security model.
|
||||
|
||||
```forth
|
||||
\ User defines a custom scoring function
|
||||
: SCORE ( age salary -- score )
|
||||
1000 / \ salary contribution (salary/1000)
|
||||
SWAP 50 - ABS \ age penalty (distance from 50)
|
||||
- \ final score
|
||||
;
|
||||
|
||||
\ Engine uses it in a query
|
||||
: RANKED-QUERY ( -- )
|
||||
employees SCAN
|
||||
BEGIN NEXT-ROW WHILE
|
||||
DUP 1 COL@ OVER 3 COL@ SCORE
|
||||
50 > IF PROJECT THEN
|
||||
REPEAT CLOSE
|
||||
;
|
||||
```
|
||||
|
||||
The `SCORE` function compiles to a WASM module through WAFER's JIT. It runs
|
||||
at near-native speed, sandboxed, with no FFI overhead.
|
||||
|
||||
### A Different Way to Look at It
|
||||
|
||||
Database engineers put a lot of effort into building query VMs — designing
|
||||
bytecode formats, writing interpreters, adding JIT compilation. In a sense,
|
||||
they're often reinventing something Forth-shaped each time. It's worth asking:
|
||||
what if you just started with Forth and built the storage layer underneath it?
|
||||
|
||||
---
|
||||
|
||||
## 3. AI Inference: Neural Networks as Word Composition
|
||||
|
||||
### Layers Are Words, Forward Pass Is Composition
|
||||
|
||||
A neural network's forward pass is a pipeline: input tensor enters, passes
|
||||
through a sequence of layers (linear transform, activation, normalization),
|
||||
and a prediction exits. Each layer takes a tensor and produces a tensor.
|
||||
|
||||
In Forth terms: each layer is a word. The tensor sits on the stack. The
|
||||
forward pass is the composition of those words:
|
||||
|
||||
```forth
|
||||
\ Assuming tensor operations as primitives (host functions):
|
||||
\ T-MATMUL ( tensor weights -- tensor )
|
||||
\ T-ADD ( tensor bias -- tensor )
|
||||
\ T-RELU ( tensor -- tensor )
|
||||
\ T-SOFTMAX ( tensor -- tensor )
|
||||
|
||||
: LINEAR1 ( tensor -- tensor ) w1 T-MATMUL b1 T-ADD ;
|
||||
: LINEAR2 ( tensor -- tensor ) w2 T-MATMUL b2 T-ADD ;
|
||||
: LINEAR3 ( tensor -- tensor ) w3 T-MATMUL b3 T-ADD ;
|
||||
|
||||
: CLASSIFIER ( tensor -- tensor )
|
||||
LINEAR1 T-RELU
|
||||
LINEAR2 T-RELU
|
||||
LINEAR3 T-SOFTMAX
|
||||
;
|
||||
|
||||
input-data CLASSIFIER \ forward pass
|
||||
```
|
||||
|
||||
This maps more directly than you might expect. The compositional structure of
|
||||
neural networks lines up nicely with the compositional structure of Forth
|
||||
programs. The stack carries the data flow. The words are the layers. The
|
||||
dictionary holds the model architecture.
|
||||
|
||||
### Quantized Inference on the Integer Stack
|
||||
|
||||
Most production inference runs quantized — INT8 or INT4 weights, integer
|
||||
arithmetic, no floating point. Forth's native data type is the integer cell.
|
||||
WAFER's `i32` stack operations map directly to quantized tensor operations:
|
||||
|
||||
```forth
|
||||
\ INT8 quantized dot product of two vectors
|
||||
: QDOT ( addr1 addr2 n -- result )
|
||||
0 >R \ accumulator on return stack
|
||||
0 DO
|
||||
OVER I + C@ 127 - \ load and de-bias first element
|
||||
OVER I + C@ 127 - \ load and de-bias second element
|
||||
* R> + >R \ multiply-accumulate
|
||||
LOOP
|
||||
2DROP R>
|
||||
;
|
||||
|
||||
\ Quantized linear layer
|
||||
: QLINEAR ( input-addr weight-addr rows cols -- output-addr )
|
||||
\ For each output neuron, compute QDOT with input
|
||||
output-buf >R
|
||||
0 DO
|
||||
2DUP I row-offset + SWAP QDOT
|
||||
R@ I CELLS + !
|
||||
LOOP
|
||||
2DROP R>
|
||||
;
|
||||
```
|
||||
|
||||
No framework dependency, no Python interpreter, no CUDA runtime — just
|
||||
integer arithmetic on a stack, compiled to WASM, running on any device.
|
||||
|
||||
### Edge AI: The 50 KB Runtime
|
||||
|
||||
ML inference frameworks tend to be big. PyTorch is ~500 MB. TensorFlow Lite
|
||||
is ~1 MB for the runtime alone. ONNX Runtime is ~10 MB.
|
||||
|
||||
WAFER is ~50 KB for the full Forth system. The model weights dominate the
|
||||
binary size, not the runtime. For edge devices — IoT sensors, wearables,
|
||||
microcontrollers, browser tabs — that size difference can be the difference
|
||||
between "fits" and "doesn't fit."
|
||||
|
||||
WASM's portability means the same inference code runs on an ARM
|
||||
microcontroller, in a browser, on a server, without recompilation. Write the
|
||||
model once in Forth, deploy everywhere WASM reaches.
|
||||
|
||||
### DOES> for Architecture Generation
|
||||
|
||||
Forth's `DOES>` is a metaprogramming facility: it creates words that create
|
||||
other words, each with custom runtime behavior. This is exactly what neural
|
||||
architecture construction needs:
|
||||
|
||||
```forth
|
||||
\ LAYER is a defining word that creates layer words
|
||||
: LAYER ( weights bias rows cols -- )
|
||||
CREATE , , , , \ store dimensions and pointers
|
||||
DOES> ( tensor -- tensor )
|
||||
DUP >R \ save parameter field address
|
||||
R@ @ R@ 4 + @ \ get cols, rows
|
||||
R@ 8 + @ \ get weights address
|
||||
T-MATMUL
|
||||
R> 12 + @ \ get bias address
|
||||
T-ADD
|
||||
;
|
||||
|
||||
\ Define the network architecture
|
||||
w1 b1 768 512 LAYER EMBED
|
||||
w2 b2 512 256 LAYER HIDDEN1
|
||||
w3 b3 256 10 LAYER OUTPUT
|
||||
|
||||
\ The architecture is now executable
|
||||
: MODEL ( tensor -- tensor ) EMBED T-RELU HIDDEN1 T-RELU OUTPUT T-SOFTMAX ;
|
||||
```
|
||||
|
||||
Each `LAYER` invocation creates a new word with its own weights and
|
||||
dimensions baked in. The `MODEL` word composes them. This is the same
|
||||
pattern as `nn.Sequential` in PyTorch — but it compiles to WASM, has zero
|
||||
framework overhead, and the "architecture definition" and the "executable
|
||||
model" are the same thing.
|
||||
|
||||
### Automatic Differentiation via Dual Numbers
|
||||
|
||||
Backpropagation is reverse-mode automatic differentiation. There is an
|
||||
elegant formulation using dual numbers (a value paired with its derivative)
|
||||
that maps to Forth's double-cell operations:
|
||||
|
||||
```forth
|
||||
\ A dual number is a pair ( value derivative ) stored as a double cell
|
||||
\ WAFER's double-cell words (D+, D-, D*, 2DUP, etc.) operate on these natively
|
||||
|
||||
\ Dual addition: (a, a') + (b, b') = (a+b, a'+b')
|
||||
: D+DUAL ( a a' b b' -- a+b a'+b' )
|
||||
ROT + \ a' + b'
|
||||
>R + R> \ a + b, then restore derivative
|
||||
;
|
||||
|
||||
\ Dual multiplication: (a, a') * (b, b') = (a*b, a*b' + a'*b)
|
||||
: D*DUAL ( a a' b b' -- a*b a*b'+a'*b )
|
||||
3 PICK * \ a * b'
|
||||
>R
|
||||
ROT * \ a' * b
|
||||
R> + \ a*b' + a'*b = derivative
|
||||
>R
|
||||
* \ a * b = value
|
||||
R>
|
||||
;
|
||||
```
|
||||
|
||||
The chain rule emerges naturally: composing dual-number operations through a
|
||||
sequence of words automatically computes the derivative of the whole
|
||||
pipeline. This is the same principle behind JAX's `jvp` — but expressed as
|
||||
stack operations.
|
||||
|
||||
### A Different Way to Look at It
|
||||
|
||||
Most of the ML ecosystem's complexity lives in _training_. Inference, by
|
||||
comparison, is fairly straightforward: load weights, multiply matrices, apply
|
||||
activations, read output. That's a pipeline of arithmetic operations — which
|
||||
is pretty much what Forth was designed for. The industry tends to wrap
|
||||
inference in 500 MB frameworks because training needed those frameworks, and
|
||||
the two haven't been fully separated. A 50 KB Forth runtime doing quantized
|
||||
integer operations might be closer to what inference actually needs than we
|
||||
usually assume.
|
||||
|
||||
---
|
||||
|
||||
## 4. AI Generating Code: The Smallest Target Language
|
||||
|
||||
### The Token Economy
|
||||
|
||||
When an LLM generates code, every token costs money and adds latency. A
|
||||
Python solution to "compute the average of a list" looks like:
|
||||
|
||||
```python
|
||||
def average(numbers):
|
||||
if not numbers:
|
||||
return 0
|
||||
return sum(numbers) / len(numbers)
|
||||
```
|
||||
|
||||
That is 25 tokens. The Forth equivalent:
|
||||
|
||||
```forth
|
||||
: AVERAGE ( addr n -- avg ) 2DUP SUM -ROT NIP / ;
|
||||
```
|
||||
|
||||
That is 12 tokens. For the same semantic content, Forth uses roughly half
|
||||
the tokens. At scale — millions of API calls, each generating hundreds of
|
||||
lines — this is a meaningful cost reduction. But the token savings are the
|
||||
least interesting advantage.
|
||||
|
||||
### Minimal Syntax, Maximal Verifiability
|
||||
|
||||
Forth has essentially no syntax. There are words separated by spaces. There
|
||||
are numbers. There are a few special constructs (`:` for definitions, `IF`
|
||||
/`THEN` for conditionals, `DO`/`LOOP` for iteration). That's about it.
|
||||
|
||||
An LLM generating Python must get indentation right, match parentheses and
|
||||
brackets, handle keyword arguments, manage import statements, respect method
|
||||
resolution order, and navigate a standard library of thousands of functions.
|
||||
An LLM generating Forth mostly just needs to get the stack effect right.
|
||||
That's the main failure mode worth worrying about.
|
||||
|
||||
And stack effects are **mechanically verifiable**:
|
||||
|
||||
```forth
|
||||
\ Stack effect: ( n1 n2 -- n3 )
|
||||
\ Verification: start with 2 items on stack, end with 1
|
||||
: ADD-AND-DOUBLE ( n1 n2 -- n3 ) + 2* ;
|
||||
|
||||
\ Test:
|
||||
3 4 ADD-AND-DOUBLE \ stack should contain: 14
|
||||
```
|
||||
|
||||
You don't need a type checker or static analysis. Just run the word with
|
||||
known inputs and check the stack. If the stack depth and values match the
|
||||
declared effect, the word is correct. It's hard to think of another practical
|
||||
language where verification is this straightforward.
|
||||
|
||||
### Self-Extending Vocabulary
|
||||
|
||||
LLMs struggle with large codebases because context windows are finite. A
|
||||
Python project with 50 files and 10,000 lines requires the LLM to hold (or
|
||||
retrieve) vast amounts of context to generate correct code.
|
||||
|
||||
Forth's defining characteristic is that you build the language up to your
|
||||
problem. The LLM doesn't need to generate a 100-line solution. It generates
|
||||
5-line words, each building on the previous ones:
|
||||
|
||||
```forth
|
||||
\ Step 1: LLM generates basic operations
|
||||
: CLAMP ( n lo hi -- n' ) ROT MIN MAX ;
|
||||
: BETWEEN? ( n lo hi -- flag ) OVER - >R - R> U< ;
|
||||
|
||||
\ Step 2: LLM generates higher-level operations using step 1
|
||||
: NORMALIZE ( n -- n' ) 0 255 CLAMP ;
|
||||
: IN-RANGE? ( n -- flag ) 0 100 BETWEEN? ;
|
||||
|
||||
\ Step 3: LLM generates application logic using steps 1-2
|
||||
: PROCESS-SENSOR ( raw -- calibrated )
|
||||
offset @ - \ remove sensor offset
|
||||
NORMALIZE \ clamp to valid range
|
||||
scale @ * 1000 / \ apply calibration scale
|
||||
;
|
||||
```
|
||||
|
||||
Each step requires only the _names_ of previously defined words, not their
|
||||
implementations. The dictionary serves as a compressed representation of the
|
||||
entire program. An LLM can generate correct code by knowing only the word
|
||||
names and their stack effects — a few dozen tokens of context instead of
|
||||
thousands of lines.
|
||||
|
||||
### WASM Sandbox: Safe Execution of Untrusted Code
|
||||
|
||||
AI-generated code generally needs to be executed to be verified. Running
|
||||
arbitrary Python is tricky from a security perspective — file system access,
|
||||
network calls, `import os`, `eval()`. Sandboxing Python typically requires
|
||||
containerization, seccomp filters, or virtual machines.
|
||||
|
||||
WAFER compiles to WASM, which executes in a sandbox by construction. A
|
||||
WAFER program:
|
||||
|
||||
- Cannot access the file system
|
||||
- Cannot make network calls
|
||||
- Cannot read memory outside its linear memory
|
||||
- Cannot execute longer than the host allows (fuel metering)
|
||||
- Cannot consume more memory than the host allocates
|
||||
|
||||
You can run AI-generated Forth with roughly the same confidence as a pure
|
||||
mathematical function. The sandbox isn't a bolt-on — it's just how WASM
|
||||
works.
|
||||
|
||||
```forth
|
||||
\ AI generates this code. Is it safe to run? Yes, always.
|
||||
: FIBONACCI ( n -- fib )
|
||||
DUP 2 < IF EXIT THEN
|
||||
DUP 1- RECURSE
|
||||
SWAP 2 - RECURSE
|
||||
+
|
||||
;
|
||||
```
|
||||
|
||||
There's nothing this word can do except compute. No side effects, no
|
||||
escape hatches. The WASM sandbox guarantees that structurally.
|
||||
|
||||
### A Different Way to Look at It
|
||||
|
||||
The conventional wisdom is that LLMs need expressive, high-level languages
|
||||
to generate useful code. But there's a good case for the opposite: what LLMs
|
||||
really benefit from are **verifiable** languages — ones where correctness can
|
||||
be checked cheaply and deterministically. Expressiveness can actually work
|
||||
against you here: more syntax means more ways to be wrong, more edge cases
|
||||
to handle, more context to maintain. Forth's extreme minimalism starts to
|
||||
look less like a limitation and more like an advantage: generate a few small
|
||||
words, verify each one by running it, compose them into larger programs with
|
||||
confidence. The language that's hardest for humans to read might just be the
|
||||
easiest for machines to write correctly.
|
||||
|
||||
---
|
||||
|
||||
## 5. AI Agent Control: Plans That Execute Themselves
|
||||
|
||||
### The Plan-Program Gap
|
||||
|
||||
When an AI agent "plans," it produces a sequence of steps in natural
|
||||
language:
|
||||
|
||||
> 1. Search for files matching "*.config"
|
||||
> 2. Read each file and extract the "timeout" field
|
||||
> 3. If timeout > 30, update it to 30
|
||||
> 4. Write the modified files back
|
||||
|
||||
This plan is then "executed" by the agent interpreting each step, calling
|
||||
tools, handling errors, and managing state — all mediated by the LLM at
|
||||
every step, consuming tokens and latency for what is fundamentally a
|
||||
sequential program.
|
||||
|
||||
The gap between "plan" and "program" might be more artificial than it looks.
|
||||
A plan _is_ a program — we just don't usually give agents a good executable
|
||||
representation for it.
|
||||
|
||||
Forth could be that representation.
|
||||
|
||||
### Tools as Words
|
||||
|
||||
Every agent tool — file read, web search, code execution, API call — maps
|
||||
to a Forth word. The agent's toolkit becomes a Forth dictionary:
|
||||
|
||||
```forth
|
||||
\ Agent tool vocabulary (host functions)
|
||||
\ SEARCH-FILES ( pattern-addr pattern-len -- results-addr count )
|
||||
\ READ-FILE ( path-addr path-len -- content-addr content-len )
|
||||
\ WRITE-FILE ( content-addr content-len path-addr path-len -- )
|
||||
\ JSON-GET ( json-addr key-addr key-len -- value-addr value-len )
|
||||
\ SHELL ( cmd-addr cmd-len -- output-addr output-len )
|
||||
\ ASK-USER ( question-addr question-len -- answer-addr answer-len )
|
||||
```
|
||||
|
||||
Now the plan from above becomes an executable program:
|
||||
|
||||
```forth
|
||||
: UPDATE-TIMEOUTS ( -- )
|
||||
S" *.config" SEARCH-FILES \ get matching files
|
||||
0 DO \ for each file
|
||||
DUP I CELLS + @ COUNT \ get filename
|
||||
2DUP READ-FILE \ read contents
|
||||
S" timeout" JSON-GET \ extract timeout field
|
||||
S>NUMBER DROP \ convert to number
|
||||
30 > IF \ if timeout > 30
|
||||
30 SET-TIMEOUT \ update to 30
|
||||
WRITE-FILE \ write back
|
||||
ELSE
|
||||
2DROP \ discard unchanged
|
||||
THEN
|
||||
LOOP
|
||||
DROP
|
||||
;
|
||||
|
||||
UPDATE-TIMEOUTS
|
||||
```
|
||||
|
||||
This isn't a description of what to do — it _is_ what to do. The agent
|
||||
generates it, WAFER compiles it to WASM, and it runs — no LLM in the loop
|
||||
during execution, no token cost per step, no latency per tool call.
|
||||
|
||||
### Error Handling with CATCH/THROW
|
||||
|
||||
Of course, agent plans fail. Files don't exist. APIs return errors.
|
||||
Permissions get denied. Production agent systems need robust error handling,
|
||||
which typically means calling the LLM at every step to decide what to do
|
||||
when something goes wrong.
|
||||
|
||||
WAFER has `CATCH` and `THROW` — structured exception handling that lets
|
||||
the plan itself define error recovery:
|
||||
|
||||
```forth
|
||||
: SAFE-READ ( path-addr path-len -- content-addr content-len | 0 0 )
|
||||
['] READ-FILE CATCH IF
|
||||
2DROP 0 0 \ file not found: return empty
|
||||
THEN
|
||||
;
|
||||
|
||||
: SAFE-UPDATE ( filename-addr filename-len -- )
|
||||
2DUP SAFE-READ \ try to read
|
||||
DUP 0= IF 2DROP 2DROP EXIT THEN \ skip if file missing
|
||||
S" timeout" JSON-GET
|
||||
S>NUMBER DROP
|
||||
30 > IF
|
||||
30 SET-TIMEOUT
|
||||
WRITE-FILE
|
||||
ELSE
|
||||
2DROP 2DROP
|
||||
THEN
|
||||
;
|
||||
|
||||
: ROBUST-UPDATE-TIMEOUTS ( -- )
|
||||
S" *.config" SEARCH-FILES
|
||||
0 DO
|
||||
DUP I CELLS + @ COUNT SAFE-UPDATE
|
||||
LOOP
|
||||
DROP
|
||||
;
|
||||
```
|
||||
|
||||
The error handling is part of the plan. The agent generates it once, and it
|
||||
runs to completion without further LLM intervention. Errors are handled at
|
||||
the speed of WASM, not the speed of an API call to an LLM.
|
||||
|
||||
### The Dictionary as Growing Capability
|
||||
|
||||
A human Forth programmer builds up vocabulary: small words compose into
|
||||
larger words, which compose into still larger words. The dictionary grows
|
||||
with the programmer's understanding of the problem.
|
||||
|
||||
An AI agent does the same thing. Each successfully executed plan leaves
|
||||
behind defined words that can be reused:
|
||||
|
||||
```forth
|
||||
\ First task: agent learns to read configs
|
||||
: READ-CONFIG ( path-addr path-len -- json-addr json-len )
|
||||
SAFE-READ DUP 0= IF EXIT THEN JSON-PARSE ;
|
||||
|
||||
\ Second task: agent learns to update configs
|
||||
: UPDATE-CONFIG ( key-addr key-len value path-addr path-len -- )
|
||||
2DUP READ-CONFIG JSON-SET WRITE-FILE ;
|
||||
|
||||
\ Third task: agent composes previous capabilities
|
||||
: MIGRATE-CONFIGS ( -- )
|
||||
S" *.config" SEARCH-FILES
|
||||
0 DO
|
||||
DUP I CELLS + @ COUNT
|
||||
S" timeout" 30 ROT ROT UPDATE-CONFIG
|
||||
LOOP DROP
|
||||
;
|
||||
|
||||
\ The agent's vocabulary grows with experience.
|
||||
\ MIGRATE-CONFIGS didn't exist before. Now it does.
|
||||
\ Next time, the agent can use it as a building block.
|
||||
```
|
||||
|
||||
You could call this _learned tool use_ — not in the machine learning sense,
|
||||
but in the software engineering sense. The agent defines new capabilities in
|
||||
terms of old ones, and the dictionary persists across invocations. Over time,
|
||||
the agent's vocabulary naturally converges on the abstractions that matter
|
||||
for its operational domain.
|
||||
|
||||
### REPL as Test-Before-Commit
|
||||
|
||||
Agents that act irreversibly on the first try are risky. WAFER's REPL model
|
||||
gives agents a natural test-before-commit workflow:
|
||||
|
||||
1. **Define**: Generate and compile the plan as Forth words.
|
||||
2. **Test**: Run the words against sample data on the stack.
|
||||
3. **Verify**: Check the stack for expected results.
|
||||
4. **Execute**: Run the plan for real only after verification passes.
|
||||
|
||||
```forth
|
||||
\ Step 1: Define
|
||||
: CALCULATE-DISCOUNT ( price tier -- discounted )
|
||||
CASE
|
||||
1 OF 10 ENDOF \ tier 1: 10% off
|
||||
2 OF 20 ENDOF \ tier 2: 20% off
|
||||
3 OF 35 ENDOF \ tier 3: 35% off
|
||||
0 SWAP
|
||||
ENDCASE
|
||||
100 SWAP - * 100 /
|
||||
;
|
||||
|
||||
\ Step 2: Test (no side effects, just stack operations)
|
||||
1000 1 CALCULATE-DISCOUNT . \ expect 900
|
||||
1000 2 CALCULATE-DISCOUNT . \ expect 800
|
||||
1000 3 CALCULATE-DISCOUNT . \ expect 650
|
||||
|
||||
\ Step 3: Verify output matches expectations
|
||||
\ Step 4: Apply to real data only after tests pass
|
||||
```
|
||||
|
||||
The agent can generate, test, and iterate without ever touching production
|
||||
data. The REPL isn't just a debugging convenience here — it's a safety mechanism
|
||||
for autonomous agents.
|
||||
|
||||
### Multi-Agent Coordination
|
||||
|
||||
Multiple agents can share a WAFER dictionary through shared linear memory.
|
||||
One agent defines words. Another agent uses them. A coordinator agent
|
||||
composes them into higher-level plans:
|
||||
|
||||
```forth
|
||||
\ Agent A defines data retrieval
|
||||
: FETCH-METRICS ( -- addr n ) metrics-api QUERY PARSE-JSON ;
|
||||
|
||||
\ Agent B defines analysis
|
||||
: DETECT-ANOMALIES ( addr n -- anomalies-addr n )
|
||||
THRESHOLD @ FILTER-ABOVE ;
|
||||
|
||||
\ Agent C defines actions
|
||||
: ALERT ( anomalies-addr n -- )
|
||||
0 DO DUP I CELLS + @ SEND-ALERT LOOP DROP ;
|
||||
|
||||
\ Coordinator composes them
|
||||
: MONITOR ( -- )
|
||||
BEGIN
|
||||
FETCH-METRICS DETECT-ANOMALIES
|
||||
DUP 0> IF ALERT ELSE DROP THEN
|
||||
60000 DELAY
|
||||
AGAIN
|
||||
;
|
||||
```
|
||||
|
||||
Each agent contributes words to a shared vocabulary. The coordinator doesn't
|
||||
need to understand the implementation of `FETCH-METRICS` or
|
||||
`DETECT-ANOMALIES` — it only needs to know their stack effects. This is
|
||||
composability without coupling, coordination without shared state beyond
|
||||
the dictionary.
|
||||
|
||||
### A Different Way to Look at It
|
||||
|
||||
The AI agent community is building increasingly sophisticated "plan
|
||||
representations" — DAGs, state machines, behavior trees, ReAct loops — all
|
||||
trying to bridge the gap between the LLM's natural language output and
|
||||
actual tool execution. But Forth is already a plan representation that
|
||||
doubles as an execution engine. It has structured control flow (`IF`/`THEN`,
|
||||
`DO`/`LOOP`, `BEGIN`/`UNTIL`), error handling (`CATCH`/`THROW`),
|
||||
composability (word definitions), and a test harness (the REPL and stack).
|
||||
Maybe the gap between "plan" and "program" doesn't need to be bridged so
|
||||
much as it needs to be _erased_.
|
||||
|
||||
---
|
||||
|
||||
## Convergence: Five Problems, One Shape
|
||||
|
||||
These five domains look different on the surface:
|
||||
|
||||
| Domain | Traditional Tool | Core Operation |
|
||||
| --------------- | ------------------------------ | -------------------- |
|
||||
| Data analytics | Pandas, Spark | Transform pipeline |
|
||||
| Database engine | SQLite VDBE, Postgres executor | Query plan execution |
|
||||
| AI inference | PyTorch, TensorFlow | Layer composition |
|
||||
| AI codegen | Python, JavaScript | Program synthesis |
|
||||
| AI agents | LangChain, CrewAI | Plan execution |
|
||||
|
||||
But they share a deep structure: **sequential composition of simple
|
||||
operations on a data flow**. A data pipeline, a query plan, a forward
|
||||
pass, a synthesized program, and an agent plan are all the same thing:
|
||||
a sequence of words applied to a stack.
|
||||
|
||||
Forth noticed this in 1970. Charles Moore designed a language around the
|
||||
observation that most computation is a pipeline of transformations, and
|
||||
the simplest way to express pipelines is sequential composition on a
|
||||
stack. The language has no syntax because pipelines don't need syntax.
|
||||
It has no type system because the data flow _is_ the type. It has no
|
||||
package manager because each program builds its own vocabulary from
|
||||
primitives.
|
||||
|
||||
WAFER brings these ideas to the modern world by targeting WebAssembly — the
|
||||
universal runtime that runs in browsers, on servers, on edge devices, in
|
||||
sandboxes. That combination opens up some interesting possibilities:
|
||||
|
||||
- **Analytics in the browser** with no server, no framework, deterministic
|
||||
execution.
|
||||
- **Database VMs** that compile queries to native WASM through an existing
|
||||
Forth JIT.
|
||||
- **Inference engines** that fit in 50 KB and run on any device WASM
|
||||
reaches.
|
||||
- **AI-generated code** in the language with the smallest syntax, cheapest
|
||||
verification, and safest sandbox.
|
||||
- **Agent plans** that are executable programs, testable in a REPL,
|
||||
composable through a growing dictionary.
|
||||
|
||||
None of this requires Forth to change. Forth has been this shape for 55
|
||||
years. It's kind of fun that the world's problems seem to be circling back
|
||||
to it.
|
||||
|
||||
---
|
||||
|
||||
_WAFER is open source. Start at the [repository root](../README.md)._
|
||||
_Architecture details: [WAFER.md](WAFER.md). Language introduction:
|
||||
[FORTH.md](FORTH.md)._
|
||||
Reference in New Issue
Block a user