Upload files to "drafts"

This commit is contained in:
2026-03-03 13:05:57 +08:00
parent e5908510db
commit 1f0f5beea0

View File

@@ -2,10 +2,9 @@
title: Building Babel - a fuzzy LLM vs the OS
categories: [Thoughts]
tags: [os]
math: true
---
# Building Babel: Turning a Fuzzy LLM into an OS
*A post about reliability, memory, and the compiler we didnt mean to write.*
When people talk about “prompt engineering,” it often sounds like a bag of tricks: write clearer instructions, add examples, constrain the format, keep history short, and pray. But if you zoom out, the pattern looks less like copywriting and more like systems engineering. Were trying to run workloads on a machine whose “CPU” is probabilistic, whose “RAM” is fixed-size, and whose caching behavior depends on keeping the same prefix intact.
@@ -17,7 +16,7 @@ That framing is useful, because it pushes us toward familiar tools: define an IS
## 0 — The foundation: why the OS analogy is structurally correct
### 0.1 A minimal machine model (OS dev lens)
### 0.1 A minimal machine model
If you strip away the marketing, an LLM session is a constrained compute device with:
@@ -39,15 +38,15 @@ We start with “LLM session = state machine with bounded memory + caching,” a
A practical theoretical model looks like this:
* Let (X) be the set of possible contexts (token sequences) with max length (N).
* Let (Y) be token outputs.
* Let $X$ be the set of possible contexts (token sequences) with max length $N$.
* Let $Y$ be token outputs.
* The model implements a stochastic policy:
[
$$
\pi(y \mid x)
]
where (x \in X).
$$
where $x \in X$.
In each interaction, you append some new tokens to (x), then the model emits tokens (y), producing a new context (x' = \text{append}(x, y)), then truncation/packing happens due to the context limit (N).
In each interaction, you append some new tokens to $x$, then the model emits tokens $y$, producing a new context $x' = \text{append}(x, y)$, then truncation/packing happens due to the context limit $N$.
From an OS perspective, the key point is not stochasticity. The key point is **boundedness**:
@@ -56,7 +55,7 @@ From an OS perspective, the key point is not stochasticity. The key point is **b
Thats why memory management dominates in practice.
### 0.3 “Main context ⇒ RAM” is not poetry: its a working-set equivalence
### 0.3 “Main context ⇒ RAM” : the working-set equivalence
In OS terms, RAM is defined by three properties:
@@ -66,9 +65,9 @@ In OS terms, RAM is defined by three properties:
An LLM context window has exactly those properties:
* bounded capacity: fixed token limit (N)
* bounded capacity: fixed token limit $N$
* fast access: everything in-context is “directly addressable” by attention
* content determines behavior: the probability distribution (\pi(\cdot\mid x)) changes when (x) changes
* content determines behavior: the probability distribution $\pi(\cdot\mid x)$ changes when $x$ changes
Thats enough to justify the equivalence “context behaves like RAM,” even though the representation isnt bytes.
@@ -101,7 +100,7 @@ That is the same structural property as a TLB/cache: stable mappings/prefixes pr
From an OS dev perspective, this creates an optimization target: keep the “kernel prefix” stable to maximize cache locality across turns.
### 0.6 “Skills compiler” is justified by separation of concerns (fuzzy planning vs deterministic execution)
### 0.6 “Skills compiler” is justified by separation of concerns
OS devs separate *policy* from *mechanism*:
@@ -134,11 +133,6 @@ This is the real foundation of the “ECC/control loop” later: error correctio
---
1. **Reliability issue & solution**: the LLM behaves like a fuzzy ALU; we wrap it with ECC-like mechanisms by compiling to a minimal instruction set and closing the loop with verification.
2. **Memory issue & solution**: the context window is physical RAM; we manage it like a Rust's arena allocator with stage checkpoints, lazy loading, and offloading.
---
## Part 1 — Reliability: the fuzzy ALU problem, and an ECC-shaped solution
@@ -146,7 +140,7 @@ This is the real foundation of the “ECC/control loop” later: error correctio
In a classic machine, the critical property is that execution is deterministic: given the same instruction stream and machine state, you get the same result. Thats what makes debugging possible, and its why “bit flips” are an exceptional event handled by ECC, parity checks, and redundancy.
LLMs invert that. The core model is best understood as a conditional distribution (P(y \mid x)): the next token depends on the prompt/context. Even if you force deterministic decoding, the *system-level behavior* remains fragile because the mapping from a messy human request to an internal strategy is not explicit and not stable. Small context changes, minor phrasing differences, or irrelevant baggage in the prompt can flip the “mode” the model enters. In practice, this looks like the ALU occasionally returning the wrong result, except the “wrongness” is semantic, not bit-level.
LLMs invert that. The core model is best understood as a conditional distribution $P(y \mid x)$: the next token depends on the prompt/context. Even if you force deterministic decoding, the *system-level behavior* remains fragile because the mapping from a messy human request to an internal strategy is not explicit and not stable. Small context changes, minor phrasing differences, or irrelevant baggage in the prompt can flip the “mode” the model enters. In practice, this looks like the ALU occasionally returning the wrong result, except the “wrongness” is semantic, not bit-level.
A direct way to improve reliability is to reduce the amount of “semantic work” the model must do while it is producing final outputs. Instead of asking the LLM to execute tasks in free-form language, we ask it to **compile** the request into a small set of **deterministic primitives**. Then we run those primitives in a runtime we control.
@@ -185,9 +179,9 @@ Structured result + trace (success/failure per instruction)
This architecture deliberately moves uncertainty into one place: compilation. Execution becomes observable and mostly deterministic.
### Why this increases reliability (a small “proof sketch”)
### Why this increases reliability
Let (U) be the user request (the “spec”), (C) be the compiler (LLM), (P) the produced plan (ISA program), (R) the runtime, and (O) the observed output. Let (V(U,O)\in{0,1}) be a checker that says whether the output satisfies the request (even a weak checker helps).
Let $U$ be the user request (the “spec”), $C$ be the compiler (LLM), $P$ the produced plan (ISA program), $R$ the runtime, and $O$ the observed output. Let $V(U,O)\in{0,1}$ be a checker that says whether the output satisfies the request (even a weak checker helps).
Because the runtime is deterministic and instrumented, the overall success probability decomposes conceptually into:
@@ -196,17 +190,17 @@ Because the runtime is deterministic and instrumented, the overall success proba
Informally:
[
$$
\Pr[V=1] \approx \Pr[P\ \text{correct}] \cdot \Pr[V=1\mid P\ \text{correct}]
]
$$
If your runtime is strict and your primitives are deterministic, (\Pr[V=1\mid P\ \text{correct}]) is high. Thats the central win: you turn “LLM unpredictability everywhere” into “LLM uncertainty mainly at compilation time.” Once the failure surface is concentrated, you can apply ECC-like techniques there.
If your runtime is strict and your primitives are deterministic, $\Pr[V=1\mid P\ \text{correct}]$ is high. Thats the central win: you turn “LLM unpredictability everywhere” into “LLM uncertainty mainly at compilation time.” Once the failure surface is concentrated, you can apply ECC-like techniques there.
### ECC for compilation: redundancy + decoding (verification)
### ECC for compilation: redundancy + decoding
ECC works by adding redundancy and then decoding based on constraints that detect and correct errors. You can do the same for plans:
1. generate multiple candidate plans (P_1, …, P_k)
1. generate multiple candidate plans $P_1, …, P_k$
2. statically validate them (types, allowed effects, resource access)
3. partially execute cheap prefixes if needed
4. select the plan that passes checks / yields valid outputs
@@ -308,7 +302,7 @@ Stage S commits:
bump_ptr = checkpoint[S] <-- bulk free (arena reset)
```
### Why this works (evidence / reasoning, OS flavor)
### Why this works
This approach gives two strong properties that are “theoretical” in the systems sense.