Bet 18 — Glass-box LLM (per-token attribution)

The federation's transparency claim depends on this bet. Every token in the federated output is attributed to specific specialists with a log-probability that reconciles, mathematically, to the joint output. This isn't approximate attribution or a heuristic — it's an algebraic identity that the catalogue measures with sub-microsecond precision.

A federation of specialists is opaque by default. Output text doesn't tell you which specialist contributed what, which features drove the prediction, or why the federation produced this token rather than that one. For most deployment scenarios this is acceptable — users care about output quality, not about its provenance. But for regulated deployment (healthcare advice, financial guidance, educational content for minors) and for the community-ownership thesis (the federation is our tool, so we should be able to inspect what it does), opacity is a deployment-blocker. The federation must be able to show its work.

This bet establishes that the federation can show its work, exactly, with no approximation, at sub-microsecond reconciliation residual.

Hypothesis

The per-specialist log-probabilities captured during the mixture combiner (Bet 04) reconcile to the joint mixture log-probability exactly — i.e., for any output token,

joint_log_prob == logsumexp(per_specialist_log_probs) − log(N)

to within numerical precision (fp32 round-off, typically 1e-6 to 1e-7).

Pre-registered criteria

STRICT: reconciliation residual < 1e-5 across 1,000 tokens × 3 specialists.
LENIENT: residual < 1e-3.
CATASTROPHIC: any residual > 1e-2 (would indicate an attribution bug — the audit trail and the production combiner disagree about what the joint output is).

The 1e-5 STRICT bar was chosen with fp16 numerical drift in mind. Most production inference runs in fp16 or bf16, with worst-case round-off around 1e-4 to 1e-5. The reconciliation should be at least as precise as the underlying numerics — if it's looser than that, the audit trail is recording approximate reconciliation, which would be a bug.

Setup

Three specialists, each loaded via the Bet 01 loader. A 1,000-token generation through the federation's mixture combiner. At each output token, the implementation captures:

The joint log-probability (the value the mixture combiner produced).
The per-specialist log-probabilities (the values each specialist's forward pass produced for the same token).
The reconciliation residual (the algebraic difference between the joint and the recomputed mixture).

The reconciliation is recomputed in the audit harness using logsumexp(per_specialist) − log(N). The residual is joint - recomputed. For 3 specialists × 1,000 tokens = 3,000 reconciliation residuals.

Result — STRICT PASS

The reconciliation residuals are tightly bounded:

| Statistic | Value | |---|---| | Median residual | 3 × 10⁻⁷ | | 95th percentile residual | 8 × 10⁻⁷ | | Max residual | 4 × 10⁻⁶ | | STRICT bar | 1 × 10⁻⁵ |

Median residual is 3e-7, well below the STRICT bar. Maximum across all 3,000 measurements is 4e-6, still below the bar. The reconciliation is mathematically tight — the per-specialist log-probs are the same numbers used inside the mixture combiner, so they had better add up to the same joint, modulo fp32 round-off.

The bet exists to prove that the audit-capture path doesn't introduce any drift versus the production combiner path. If the audit trail were running through a different code path (e.g., recomputing log-probs from logits in a different order), it could introduce drift. The 3e-7 residual confirms no drift; the audit trail and production are reading the same numbers.

What the audit trail looks like

For every output token, the audit log contains a JSON entry like:

{
  "token": "the",
  "token_id": 1023,
  "joint_log_prob": -1.234,
  "per_specialist": [
    {"id": "general-base", "log_prob": -1.512, "weight": 0.333},
    {"id": "code-specialist", "log_prob": -3.891, "weight": 0.333},
    {"id": "user-adapter-1234", "log_prob": -1.221, "weight": 0.333}
  ],
  "reconciliation_residual": 2.7e-7,
  "block_audit": [
    {"layer": 0, "norm": 1.23, "expert_routing": [0.91, 0.04, 0.05, ...]},
    {"layer": 1, "norm": 1.45, "expert_routing": [0.12, 0.83, 0.03, ...]},
    ...
  ]
}

The audit log is the size of a few-hundred-byte structured record per token. For a 1,000-token output, the audit log is on the order of a few hundred KB. This is small enough to ship alongside the output, store indefinitely, or post-process for analytics.

What the user-facing UI can show

For any output token, the audit trail enables:

Which specialist most "wanted" this token. The specialist with the highest per-specialist log-probability for the chosen token. Useful for explaining "this answer was driven by the code specialist" or "this answer was a consensus across all three specialists."
The marginal influence of each specialist on the joint output. Computed from the per-specialist log-probs and the mixture weights. Lets the UI show "specialist A contributed 60% of the joint probability mass; specialist B contributed 30%; specialist C contributed 10%."
A reconciliation flag. If the residual ever spikes above 1e-3 (a value that fp32 round-off shouldn't reach), something is wrong. The audit trail acts as its own integrity check; bugs in the inference pipeline manifest as elevated residuals before they manifest as observable output errors.
Per-layer activation traces. Which layers were "active" for this token, which experts the router selected, where the residual stream's norm was unusually high. Useful for debugging unexpected outputs.

Why this matters for regulated deployment

Healthcare deployment example. A federation that recommends a treatment plan must be able to show: which specialist contributed which part of the recommendation, what the per-specialist confidence was, whether any specialist was unusually overruled by the mixture, and whether any layer's activation pattern was outside the normal range. With glass-box, all of these are queryable directly from the audit log. Without glass-box, the deployment has to be classified as a black-box decision support system, which has a much higher regulatory bar.

Education deployment example. A federation that helps students understand a concept must be able to show, for any answer it produced, which sources (specialists) the answer drew from. The audit trail makes this a structured query against the per-specialist log-probs. Teachers can verify that a math-heavy answer drew from the math specialist rather than the general-prose specialist.

Finance deployment example. A federation that summarises an investment opportunity must be able to disclaim that "this answer was 80% from the financial-news specialist and 20% from the user's own personal context adapter." The audit trail makes the disclosure mechanical.

In all three cases, glass-box converts "the federation is opaque, deployment is hard" into "the federation is transparent at the per-token level, deployment is straightforward." The 0.6% audit overhead (Bet 17) makes this transparency cheap enough to ship by default; the 3e-7 reconciliation residual (this bet) makes it mathematically sound enough to use in adversarial contexts.

What this does not claim

Glass-box explains why the model said what it said. Per-specialist attribution explains which specialists contributed but not what features within each specialist drove the contribution. Mechanistic interpretability (which features in which layer fired for which reason) is a deeper question this bet doesn't address. The audit trail gives the what, not the why.
Glass-box is a complete debugging tool. Per-token attribution narrows down debugging from "all 3 specialists × all 1,000 tokens" to "this specialist on these tokens." That's a 1,000× narrowing. Further narrowing (down to specific weights or specific training examples) requires interpretability tools the federation doesn't yet have.
Glass-box defends against adversarial attribution. A specialist that is intentionally producing misleading per-specialist log-probs (e.g., echoing the mixture's expected log-prob) could fool the audit trail. The reconciliation property only validates that the per-specialist log-probs add up to the joint; it doesn't validate that each specialist is honestly reporting its own beliefs. Adversarial-attribution defence is a separate research problem.
Glass-box scales to thousands of specialists. The audit log per token grows linearly in the number of specialists. At 3 specialists, it's manageable. At 1,000 specialists, the audit log per token would be ~100 KB, and total audit data per inference might exceed the inference cost itself. Practical glass-box deployments will need either specialist filtering (Bet 03's keyword router or similar) or compressed audit formats. Open work.

How this changes federation operation

Three concrete operational changes:

Audit-on-by-default. Combined with Bet 17's overhead result, glass-box is on for every inference. The audit log is always available; consumers query it as needed.
Reconciliation-residual monitoring. Production inference pipelines monitor the reconciliation residual as a health check. Spikes above 1e-3 trigger alerts; the residual is the canary for inference-pipeline bugs.
Per-specialist attribution analytics. The federation can compute, in aggregate, "specialist X is contributing more than 50% of the joint probability mass on more than 80% of inferences" — a signal that specialist X is over-relied-upon and should be load-balanced. Or "specialist Y contributes < 1% of the joint probability mass" — a signal that Y is dispensable for the current traffic mix.

Run command

PYTHONPATH=src python -m experiments.bets.18_glass_box

Output: experiments/bets/results/18_glass_box.json records the reconciliation residual statistics across 1,000 tokens × 3 specialists, plus the full per-token audit log for the first 10 tokens (as an example).

Bet 04: mixture combiner. The mathematical foundation that makes reconciliation possible.
Bet 17: bounded audit overhead. The implementation rule that makes glass-box affordable.
Bet 11: pay-with-bandwidth ledger. Audit trail as proof-of-work.
Bet 14: royalty ledger for specialists. Audit-driven attribution for royalty calculation.

Why it matters

A federation of specialists is opaque by default; output text doesn't tell you which specialist contributed what. Glass-box attribution makes the federation auditable at the per-token level. For deployment in regulated environments — healthcare, education, finance — this is the difference between "a useful tool" and "a deployable tool." For the community-ownership thesis, glass-box is what makes the federation inspectable by the community, not just operated by the community.

The bet is also a foundation for downstream features. Per-specialist royalty (Bet 14) requires the audit trail. Pay-with-bandwidth settlement (Bet 11) uses the audit trail. Specialist-mix analytics, debugging, regulatory disclosure all flow from this bet's mathematical guarantee.

The reconciliation residual at 3e-7 is the kind of measurement that doesn't get celebrated but does load-bearing work. The number itself isn't impressive; what's impressive is what it enables — every other transparency-dependent feature in the federation gets to take this property for granted.