What is SharedLLM
SharedLLM is two things in one repository, and the relationship between them is the project's central design idea.
The first is a community-owned distributed LLM inference network. Pool consumer hardware across a LAN or a wider mesh, share models, pay only the network cost. A coordinator builds inference pipelines from registered nodes; primaries serve user-facing inference; workers accept layer-offload over an authenticated TCP proxy. The network treats compute, RAM, and bandwidth as community resources rather than as products to be sold.
The second is a federation of centrally-trained specialists (RFC-0006). Specialists are individually trained on conventional infrastructure — wherever the trainer has GPUs and data — then composed at inference time via a mathematically-grounded mixture combiner. Per-user adapters at 9 KB attach to a base model and personalise it without a single training step on the user's machine. The federation is the layer above the network: the network carries the bytes, the federation decides what bytes to carry.
Together they form a community-owned, privacy-first distributed LLM platform that aims for the same capability as centralised LLM services at a fraction of the deployment cost, with explicit transparency at the per-token level.
Why federation rather than federated pretraining
The project started with a more ambitious thesis: federate pretraining itself. Gradient-share a frontier model across consumer machines. Train a 70B-class model the way Bitcoin runs — distributed, trustless, community-operated. The thesis was attractive, the engineering ambitious, and the political case (no centralised AI lab owns the model) compelling.
The path is closed. Three reasons, all now well-understood:
-
Communication bandwidth. Pretraining a frontier model demands ~100 PB of gradient traffic over a full run. Distribute that across 100,000 consumer machines and each machine's share is 1 TB of upload. Consumer uplink can't carry that on any reasonable timeline. The bandwidth math is the dominant constraint, and it's not even close.
-
Synchronisation cost. Even DiLoCo with K=100 inner steps spends a non-trivial fraction of wall-clock on the all-reduce. Bet 62 (the K-step DiLoCo retraction) found K=1 wins under early stopping at small scale; the K-bandwidth tradeoff at large scale is still open, but the optimistic framing of "K=100 is just better" has been retracted. The right K-value at deployment scale is unresolved, and the bandwidth-vs-loss tradeoff means there's no free lunch.
-
Byzantine surface area. Open participation means an adversarial fraction must be tolerated by the aggregator. Bet 44 validated coordinate-wise median at 1000× byzantine scale under combined churn + byzantine + partition; that's workable for the "occasional malicious actor" threat model but not for "coordinated state-level adversary." Federated frontier pretraining attracts the latter; the federation's threat model can't accommodate it.
The federation-of-specialists path sidesteps all three by training centrally and composing cheaply. Specialists are trained on whatever hardware the trainer has access to (a single GPU, a small cluster, a research lab's compute). Composition happens at inference time via the mixture combiner — no gradient-sharing across the network. Per-user adapters are trained on the user's own device on the user's own data.
The path is less ambitious in one sense (no federated pretraining) and more ambitious in another (genuinely deployable, not just theoretically interesting). The catalogue spent ~30 bets validating this framing before committing to it as the deployment strategy.
What's been validated
The bets harness has run 63 falsifiable experiments with strict / lenient / catastrophic criteria written before each experiment ran. The results that survived include:
-
Norm-only per-user adapter (Bet 37, replicated 15/15 in Bet 46, Pareto-dominant in Bet 49). 9 KB per user. Beats LoRA-r4 and full FT in the 5-minute training budget. Survives the personalization-vs-regularization confusion matrix in Bet 61 (own-adapter wins by 5–29% margin).
-
Mixture combiner (Bet 04).
logsumexp(specialist_logprobs) − log(N)is bounded by the best specialist by Jensen's inequality. The naive sum-of-log-probs combiner is unsafe — it can produce results worse than the worst specialist. The federation uses the safe combiner everywhere. -
Byzantine-robust aggregation (Bet 44). Coordinate-wise median holds within 1.084× clean baseline under combined churn + byzantine + partition. Open-participation training is operationally tolerable.
-
Production wire format (Bet 52). Ternary base + norm-only adapter compose; 50–65% adapter improvement on top of 10× base saving. The federation's deployment cost is 6 MB shared base + 9 KB per-user adapter.
-
Glass-box attribution (Bet 18). Per-token reconciliation residual ≈ 3e-7 — mathematically tight enough for regulated deployment. The federation can show its work at the per-token level.
-
Bounded audit overhead (Bet 17). 0.6% inference-time cost via the deferred-extraction implementation rule. Audit-on-by-default is operationally cheap.
-
Numerical robustness (Bet 63). Hidden-state Gaussian noise up to σ_rel = 1e-1 doesn't destabilise personalisation. 4 orders of magnitude of headroom against fp16 round-off.
-
Falsifications that closed wrong directions (Bet 31 model-soup, Bet 38 expert collapse, Bet 40 layer-skip, Bet 50 K=100 DiLoCo retracted by Bet 62, Bet 55 cross-user adapter ensembling). Each falsification rules out a tempting wrong path and constrains the federation's design.
Each result has a corresponding entry in the catalogue with full setup, criteria, and run command. A reader who wants to verify the result can clone the repo, run the bet's command, and inspect the JSON output.
What we don't claim
The catalogue's discipline includes being explicit about scope. The bets harness operates at the 30M-parameter scale on short held-out texts; the federation's deployment thesis is at substantially larger scale on substantially noisier data. The gap is a known scoping limit, not a hidden assumption.
Specifically, we do not claim:
- Federated frontier pretraining at scale. Closed for the bandwidth reason above.
- Phone-class deployment validated. Bet 45's throttle-invariance and Bet 63's numerical robustness are necessary preconditions, not sufficient evidence — real on-device testing for thermal management, OS process kill, memory pressure, and battery cost is open work.
- Cross-ISP federation throughput. All 63 bets used in-process FastAPI test clients or LAN ethernet. Real-WAN throughput across consumer ISPs is the most consequential open question and is unmeasured.
- 1B+ scale per-user adapters. Norm-only's 30M-scale Pareto-dominance hasn't been re-validated at 1B+. The deployment story assumes the result generalises; the catalogue is honest that this is an assumption.
- Production-ready deployment in any specific institutional setting. Kerala IT@School is the obvious deployment scenario, but the institutional work to make it happen hasn't started. The catalogue is a research artifact; deployment is open work.
These open questions are the subject of the catalogue's Open Questions chapter. They are not "we'll get to it later" notes; they are explicit gaps in the deployment thesis that need real measurement before the federation is production-ready.
Why this scoping matters
A research artifact that overstates its scope erodes trust. The catalogue's value comes from its claims being trustworthy at face value: a STRICT-PASS bet at 30M scale on clean prose means exactly that, not "and probably also at 1B+ on noisy real-user data." The federation's deployment story is built on the bets that have actually run, with explicit acknowledgment of what hasn't.
The discipline scales. As more bets close (1B+ adapter shootout, real-WAN measurement, on-device phone validation, Kerala pilot), the federation's claims become broader and the deployment story becomes more concrete. Until those bets run, the catalogue's strongest claims are scoped to the 30M-LAN regime, with everything else explicitly marked open.
This is a different posture from the typical ML research project, which tends to claim broad applicability from narrow validation. The catalogue's posture is "narrow validation, calibrated claims, explicit gaps." The federation can be honest about its current state because the methodology makes honesty cheap (pre-registered criteria, automatic JSON results, retraction-friendly catalogue).
What to read next
The Bets Methodology page explains why the project structures work this way — pre-registered criteria, falsifications-stay-in-catalogue, retracted bets as first-class citizens. It's the operating manual for the catalogue's empirical discipline.
The chapter index lists every bet entry — wins, falsifications, retractions — organized by chapter (foundations, federation primitives, training, adapters, wire format, transparency). Each entry has its hypothesis, criteria, result, and run command.
How to Contribute walks through running a node on the federation, training a specialist, or proposing a bet. The repository is open; contribution paths range from "run the federation on your laptop" (operational) to "submit a falsification of an existing claim" (research).
The Open Questions chapter is where the most consequential work currently sits. Real-WAN throughput, 1B+ adapter scaling, on-device phone deployment, Kerala IT@School pilot — these are the bets that, when they close, define the federation's deployment readiness.
Why this matters
The federation is not just a research project. It's a deployment thesis: that community-owned distributed LLM inference is technically feasible, economically tractable, and politically necessary. Each of those claims has empirical content the catalogue can validate (or fail to validate).
Technically feasible. 63 bets at the protocol level. Federation primitives compose, mixture combiners are mathematically sound, byzantine robustness holds, numerical headroom is comfortable. The protocol layer is solid.
Economically tractable. Wire-format compression (ternary base + norm-only adapter) gets the per-user deployment cost to 6 MB + 9 KB. Per-user adapter training is interactive-fast (5 seconds at 30M scale). The economics work for the deployment scenarios the federation targets.
Politically necessary. The current LLM landscape is dominated by a small number of centralised providers. The community-owned alternative requires the technical foundation the catalogue is building plus the institutional path the catalogue is honest about not having yet started. The federation is the technical half of a two-half problem.
The catalogue's job is to make the technical foundation visible and verifiable. The institutional half — the Kerala pilot, the partnerships, the published papers, the deployment infrastructure — is what the project's open-questions chapter is committed to making visible too. Both halves have to clear before the federation is real.
For now, what's real is the catalogue: 63 bets, calibrated claims, explicit open questions, and a public methodology. That's the foundation. Everything else is open work.