Bet 01 — Manifest to live FractalMoE

The first bet in the catalogue. Take an RFC-0006 specialist manifest, register it in the directory, instantiate the live model, run a forward pass — all in one process, with no per-specialist code. This is the foundation everything else builds on, and it's the bet that determines whether RFC-0006's "federation of specialists" thesis is operationally feasible at all.

The bet is unglamorous on its face. There's no quantitative result to celebrate, no surprising margin, no retraction story. It's the bet that proves the plumbing exists. But that plumbing is what makes every subsequent bet possible. Without the manifest-to-live-model loader, every specialist in the federation would be a special case, and "federation of specialists" would be "a list of bespoke integrations."

Background — what an RFC-0006 manifest is

RFC-0006 is the specialist directory specification. A specialist publishes a manifest declaring its identity, capabilities, and load-time requirements. The minimum manifest fields:

  • Model id — globally unique, content-addressed.
  • Vocab — tokenizer identifier (often a HuggingFace tokenizer hash or an explicit JSON).
  • Hidden dim, layer count, head count — the geometry the loader needs to instantiate the model object.
  • Expert count, expert dim — for MoE architectures, the additional structural dimensions.
  • Quantisation tag — whether the weights are stored fp16, ternary, 1.58-bit, or other.
  • Weight file URLs — content-addressed pointers to the actual weight tensors.
  • Per-user adapter slot specification — what adapter format the specialist accepts (norm-only, LoRA-r4, none).

The manifest is JSON, signed by the specialist's owner, content-addressed for tamper-evidence, and small enough to fit in a gossip-protocol message (typically under 4 KB).

The bet asks: given this manifest and access to the weight files at the URLs it points to, can we instantiate a callable, working FractalMoE entirely from manifest data, with no specialist-specific Python code on the receiving end?

Hypothesis

A specialist can be loaded from disk into a live, callable model purely through the manifest path. No extra glue, no per-specialist code, no out-of-band metadata. The forward pass on a held-out test input returns a logit vector of the correct shape with no NaN or Inf, and the logit values are within a sanity-check range (max log-probability after softmax > -20, no all-uniform distributions).

Pre-registered criteria

  • STRICT: manifest → registry entry → live model → coherent forward pass on a sample input, end-to-end, in one process. Logits valid (no NaN, sensible distribution).
  • LENIENT: the same, but with a known patch list applied at load time. (E.g., if a manifest field needs translation from one tokenizer convention to another, applying that translation is acceptable.)
  • CATASTROPHIC: any path that requires per-specialist Python code at load time. This would invalidate the federation thesis — federation can't depend on the specialist publisher having shipped specific Python code to every node.

Setup

The implementation is in src/sharedllm/coordinator/loader.py. The loader:

  1. Parses the manifest JSON.
  2. Validates the schema (required fields present, value types correct).
  3. Verifies content addresses (the model id matches a hash of the manifest content; the weight file URLs match content hashes).
  4. Instantiates a FractalMoE PyTorch module from the manifest's hidden_dim / n_layers / n_heads / expert_count fields.
  5. mmaps the weight files and binds them to the corresponding tensors.
  6. Loads the tokenizer (either a HuggingFace tokenizer by id, or an inline JSON tokenizer from the manifest).
  7. Registers the live model in the directory (in-memory dict keyed by model id).
  8. Runs a smoke-test forward pass: tokenize "hello world", forward through the model, return the logit vector for the next token.
  9. Validates the logit vector (no NaN, no Inf, max log-prob > -20, distribution non-uniform).

The smoke test runs against three reference manifests in the bets harness — the FractalMoE 30M base, a programmer-fixture-trained variant, and a novelist-fixture-trained variant. The bet asserts all three load and produce valid logits.

Result — STRICT PASS

The loader runs all three reference manifests end-to-end. The forward pass on a held-out token returns a logit vector of the correct shape with no NaN, no Inf, and a sensible probability distribution.

The full path is exercised every time a specialist joins the directory, so this bet is in effect re-run continuously throughout the catalogue. It has held since the harness existed; every other bet in the catalogue depends on this one passing as a precondition.

Why this is harder than it looks

The challenge in manifest-driven loading isn't loading a single, known model. PyTorch's torch.load has been doing that for years. The challenge is loading a model whose architecture variants you don't know in advance.

Specialist publishers will, over time, ship models that differ from the canonical FractalMoE in small ways:

  • Different hidden_dim (more or less capacity).
  • Different vocabulary (sometimes a superset, sometimes a subset).
  • Different expert count or expert routing geometry.
  • Different normalisation layer placement or scaling.
  • Different attention variants (rotary embeddings, ALiBi, sliding window).

The manifest must declare these differences explicitly enough that the loader can instantiate the right variant without specialist-specific code. This requires a schema that's expressive enough to span the variants but constrained enough that the loader doesn't have to be a general-purpose model interpreter.

The bet's narrow result — three reference manifests load — is also a result about the schema design. The schema is expressive enough for the canonical FractalMoE family. Whether it's expressive enough for future variants is open work; if a future specialist requires a manifest field that doesn't exist, the schema needs to grow, and the loader needs to grow with it. The federation has to plan for schema evolution.

Failure modes the bet exposes

Three classes of loader failure that the bet's smoke test catches:

  1. Schema validation failures. Manifest missing required fields, or fields with wrong types. The loader rejects the manifest with a structured error before any tensor allocation. Caught at parse time.

  2. Content addressing failures. Manifest claims a model id that doesn't match its content hash, or weight file URLs whose content doesn't match their hashes. The loader rejects the manifest. This is the federation's primary defence against spoofed specialists.

  3. Forward-pass failures. Manifest is structurally valid but the resulting model produces NaN or Inf logits, or a uniform distribution (suggesting the weights weren't bound correctly). The loader rejects the model after the smoke-test step.

A specialist that passes the loader has cleared all three classes of failure. The downstream federation infrastructure (combiner, router, adapter loader) can assume a working specialist on the other side of the loader.

What this leaves open

  • Schema evolution. The manifest schema is at v1. Future specialists may require fields not in v1. The loader needs to handle multiple schema versions, and the directory needs to know which versions which nodes support.
  • Streaming weight loading. The loader currently mmaps the entire weight file before the smoke test runs. For very large models on memory-constrained nodes, streaming layer-by-layer load would be useful. Not yet implemented.
  • Adversarial manifests. A malicious specialist publisher might ship a manifest that's structurally valid but causes the loader to allocate huge tensors before the validation step catches the issue. The loader should fail-fast on size limits before allocation. Partial mitigation exists; full mitigation is open work.
  • Cross-tokenizer compatibility. Two specialists with different tokenizers cannot directly hand off intermediate activations (the tokens are different). The federation handles this by re-tokenising at the handoff point. The loader doesn't currently advertise tokenizer compatibility groups; future work.

Why it matters

Federation depends on this being mechanical. A specialist trained somewhere else, by someone we don't control, with a tokenizer and dimensionality we have to discover from a file, must instantiate without per-specialist intervention. Without that property, every specialist is a special case, and federation reduces to a list of one-off integrations.

The bet's STRICT pass establishes the property at the v1 schema. Every subsequent bet in the catalogue assumes this property holds — when Bet 04 (mixture combiner) loads two specialists and runs them in parallel, when Bet 18 (glass-box LLM) attributes per-token log-probs across specialists, when Bet 52 (ternary + adapter) composes a quantised base with a per-user adapter — all of these depend on the loader handling whatever specialist they invoke.

This is the unglamorous-but-load-bearing kind of bet. There's no eye-catching number, no retraction drama, no marketing-friendly headline. The bet is just "the plumbing works, and we can build on top of it." Every catalogue needs this kind of bet near the start; the work that comes after only makes sense once the foundation has been validated.

Run command

PYTHONPATH=src python -m experiments.bets.01_loader

Output: experiments/bets/results/01_loader.json records the smoke-test outcomes for all three reference manifests, with per-step timings (parse, validate, instantiate, mmap, load tokenizer, smoke test, validate logits) and any error traces.

  • Bet 02: end-to-end federation with real artefact. Two-specialist federation built on top of this loader.
  • Bet 03: dumb keyword router. Selects which specialist to invoke; assumes the loader provides a working specialist.
  • Bet 05: KV-cache federation. Wire-format envelope between specialists; assumes both specialists have been loaded via this path.
  • Bet 04: mixture combiner. Combines logits from multiple loader-instantiated specialists.
  • Bet 18: glass-box LLM. Per-token attribution across loader-instantiated specialists.