Upload files to "drafts"

2026-03-03 13:05:57 +08:00
parent e5908510db
commit 1f0f5beea0
1 changed files with 21 additions and 27 deletions
--- a/drafts/2026-03-03-building-babel-a-fuzzy-llm-vs-the-os.md
+++ b/drafts/2026-03-03-building-babel-a-fuzzy-llm-vs-the-os.md
@@ -2,10 +2,9 @@
 title: Building Babel - a fuzzy LLM vs the OS
 categories: [Thoughts]
 tags: [os]
+math: true
 ---

-# Building Babel: Turning a Fuzzy LLM into an OS
-
 *A post about reliability, memory, and the compiler we didn’t mean to write.*

 When people talk about “prompt engineering,” it often sounds like a bag of tricks: write clearer instructions, add examples, constrain the format, keep history short, and pray. But if you zoom out, the pattern looks less like copywriting and more like systems engineering. We’re trying to run workloads on a machine whose “CPU” is probabilistic, whose “RAM” is fixed-size, and whose caching behavior depends on keeping the same prefix intact.
@@ -17,7 +16,7 @@ That framing is useful, because it pushes us toward familiar tools: define an IS

 ## 0 — The foundation: why the OS analogy is structurally correct

-### 0.1 A minimal machine model (OS dev lens)
+### 0.1 A minimal machine model

 If you strip away the marketing, an LLM session is a constrained compute device with:

@@ -39,15 +38,15 @@ We start with “LLM session = state machine with bounded memory + caching,” a

 A practical theoretical model looks like this:

-* Let (X) be the set of possible contexts (token sequences) with max length (N).
-* Let (Y) be token outputs.
+* Let $X$ be the set of possible contexts (token sequences) with max length $N$.
+* Let $Y$ be token outputs.
 * The model implements a stochastic policy:
-  [
+  $$
  \pi(y \mid x)
-  ]
-  where (x \in X).
+  $$
+  where $x \in X$.

-In each interaction, you append some new tokens to (x), then the model emits tokens (y), producing a new context (x' = \text{append}(x, y)), then truncation/packing happens due to the context limit (N).
+In each interaction, you append some new tokens to $x$, then the model emits tokens $y$, producing a new context $x' = \text{append}(x, y)$, then truncation/packing happens due to the context limit $N$.

 From an OS perspective, the key point is not stochasticity. The key point is **boundedness**:

@@ -56,7 +55,7 @@ From an OS perspective, the key point is not stochasticity. The key point is **b

 That’s why memory management dominates in practice.

-### 0.3 “Main context ⇒ RAM” is not poetry: it’s a working-set equivalence
+### 0.3 “Main context ⇒ RAM” : the working-set equivalence

 In OS terms, RAM is defined by three properties:

@@ -66,9 +65,9 @@ In OS terms, RAM is defined by three properties:

 An LLM context window has exactly those properties:

-* bounded capacity: fixed token limit (N)
+* bounded capacity: fixed token limit $N$
 * fast access: everything in-context is “directly addressable” by attention
-* content determines behavior: the probability distribution (\pi(\cdot\mid x)) changes when (x) changes
+* content determines behavior: the probability distribution $\pi(\cdot\mid x)$ changes when $x$ changes

 That’s enough to justify the equivalence “context behaves like RAM,” even though the representation isn’t bytes.

@@ -101,7 +100,7 @@ That is the same structural property as a TLB/cache: stable mappings/prefixes pr

 From an OS dev perspective, this creates an optimization target: keep the “kernel prefix” stable to maximize cache locality across turns.

-### 0.6 “Skills compiler” is justified by separation of concerns (fuzzy planning vs deterministic execution)
+### 0.6 “Skills compiler” is justified by separation of concerns

 OS devs separate *policy* from *mechanism*:

@@ -134,11 +133,6 @@ This is the real foundation of the “ECC/control loop” later: error correctio

 ---

-1. **Reliability issue & solution**: the LLM behaves like a fuzzy ALU; we wrap it with ECC-like mechanisms by compiling to a minimal instruction set and closing the loop with verification.
-2. **Memory issue & solution**: the context window is physical RAM; we manage it like a Rust's arena allocator with stage checkpoints, lazy loading, and offloading.
-
-
---

 ## Part 1 — Reliability: the fuzzy ALU problem, and an ECC-shaped solution

@@ -146,7 +140,7 @@ This is the real foundation of the “ECC/control loop” later: error correctio

 In a classic machine, the critical property is that execution is deterministic: given the same instruction stream and machine state, you get the same result. That’s what makes debugging possible, and it’s why “bit flips” are an exceptional event handled by ECC, parity checks, and redundancy.

-LLMs invert that. The core model is best understood as a conditional distribution (P(y \mid x)): the next token depends on the prompt/context. Even if you force deterministic decoding, the *system-level behavior* remains fragile because the mapping from a messy human request to an internal strategy is not explicit and not stable. Small context changes, minor phrasing differences, or irrelevant baggage in the prompt can flip the “mode” the model enters. In practice, this looks like the ALU occasionally returning the wrong result, except the “wrongness” is semantic, not bit-level.
+LLMs invert that. The core model is best understood as a conditional distribution $P(y \mid x)$: the next token depends on the prompt/context. Even if you force deterministic decoding, the *system-level behavior* remains fragile because the mapping from a messy human request to an internal strategy is not explicit and not stable. Small context changes, minor phrasing differences, or irrelevant baggage in the prompt can flip the “mode” the model enters. In practice, this looks like the ALU occasionally returning the wrong result, except the “wrongness” is semantic, not bit-level.

 A direct way to improve reliability is to reduce the amount of “semantic work” the model must do while it is producing final outputs. Instead of asking the LLM to execute tasks in free-form language, we ask it to **compile** the request into a small set of **deterministic primitives**. Then we run those primitives in a runtime we control.

@@ -185,9 +179,9 @@ Structured result + trace (success/failure per instruction)

 This architecture deliberately moves uncertainty into one place: compilation. Execution becomes observable and mostly deterministic.

-### Why this increases reliability (a small “proof sketch”)
+### Why this increases reliability

-Let (U) be the user request (the “spec”), (C) be the compiler (LLM), (P) the produced plan (ISA program), (R) the runtime, and (O) the observed output. Let (V(U,O)\in{0,1}) be a checker that says whether the output satisfies the request (even a weak checker helps).
+Let $U$ be the user request (the “spec”), $C$ be the compiler (LLM), $P$ the produced plan (ISA program), $R$ the runtime, and $O$ the observed output. Let $V(U,O)\in{0,1}$ be a checker that says whether the output satisfies the request (even a weak checker helps).

 Because the runtime is deterministic and instrumented, the overall success probability decomposes conceptually into:

@@ -196,17 +190,17 @@ Because the runtime is deterministic and instrumented, the overall success proba

 Informally:

-[
+$$
 \Pr[V=1] \approx \Pr[P\ \text{correct}] \cdot \Pr[V=1\mid P\ \text{correct}]
-]
+$$

-If your runtime is strict and your primitives are deterministic, (\Pr[V=1\mid P\ \text{correct}]) is high. That’s the central win: you turn “LLM unpredictability everywhere” into “LLM uncertainty mainly at compilation time.” Once the failure surface is concentrated, you can apply ECC-like techniques there.
+If your runtime is strict and your primitives are deterministic, $\Pr[V=1\mid P\ \text{correct}]$ is high. That’s the central win: you turn “LLM unpredictability everywhere” into “LLM uncertainty mainly at compilation time.” Once the failure surface is concentrated, you can apply ECC-like techniques there.

-### ECC for compilation: redundancy + decoding (verification)
+### ECC for compilation: redundancy + decoding

 ECC works by adding redundancy and then decoding based on constraints that detect and correct errors. You can do the same for plans:

-1. generate multiple candidate plans (P_1, …, P_k)
+1. generate multiple candidate plans $P_1, …, P_k$
 2. statically validate them (types, allowed effects, resource access)
 3. partially execute cheap prefixes if needed
 4. select the plan that passes checks / yields valid outputs
@@ -308,7 +302,7 @@ Stage S commits:
    bump_ptr = checkpoint[S]  <-- bulk free (arena reset)
 ```

-### Why this works (evidence / reasoning, OS flavor)
+### Why this works

 This approach gives two strong properties that are “theoretical” in the systems sense.