How to Contribute
The repository is open. Contribution to SharedLLM happens at multiple levels — operating a node, training a specialist, proposing a research bet, hardening an existing bet, or doing the institutional work that gates real deployment. Different contributions take different skills; the catalogue treats them all as first-class. Below are the four most common contribution paths, in rough order of leverage.
Run a node
The fastest way to get hands-on with the federation. A node registers with a coordinator and either serves user-facing inference (primary role) or accepts offloaded layers from other nodes (worker role). Running a node validates the federation's protocol on your hardware and contributes operational evidence to the deployment story.
The minimum stack on a LAN:
# Start the coordinator (one per federation)
sharedllm coordinator --host 0.0.0.0 --port 8420
# Start a primary (the user-facing inference node)
sharedllm node --role primary \
--model <path-to-gguf> \
--coordinator-url http://<coord-ip>:8420
# Start a worker (accepts layer offload)
sharedllm node --role worker \
--coordinator-url http://<coord-ip>:8420 \
--rpc-port 50052 --lan-addr <your-ip>:50052
Practical notes:
- Model compatibility. Models need 512-aligned hidden dimensions for RPC tensor offload. TinyLlama, Llama 3, Phi-3 work. SmolLM2-360M does not. The hidden-dimension constraint comes from the RPC protocol's tensor-shape encoding; alignment may relax in future protocol versions but is required today.
- Multi-endpoint transport. RFC-0001 lets nodes advertise LAN, WAN, and relay candidates separately. Primaries pick the best path with LAN preference. Configure your node's reachability candidates honestly — a node that lies about reachability degrades the federation's ability to make good routing decisions.
- TLS. Coordinator TLS is implemented (
--tls-cert,--tls-key) but not required for LAN deployment. WAN deployment should enable TLS. - Hardware requirements. A primary needs enough RAM to host the base model plus KV cache. A worker needs enough RAM for its assigned layers. Per-user adapters (norm-only, 9 KB each) are cheap to host; the bottleneck is the base model.
What contributing a node tells the federation:
- Operational evidence. Real-world failure modes, real-world reliability, real-world performance.
- Hardware diversity. The catalogue's bets are mostly on M-series Macs. Other hardware (Linux x86, Windows, ARM, RISC-V) reveals platform-specific issues.
- Network diversity. Different ISPs, different NAT configurations, different geographic locations. The real-WAN open question gets closer to an answer with each new node on a new network.
Train a specialist
A specialist is a centrally-trained model that joins the federation by registering an RFC-0006 manifest. Specialists are the federation's middle layer between "the base model" (general-purpose) and "the per-user adapter" (individually personalised). A specialist captures domain knowledge — code, medicine, law, a specific language, a specific task — at a level above per-user fine-tuning.
The minimum loop:
- Train on your domain corpus. Use whatever framework you prefer — PyTorch, JAX, anything that produces weight tensors. The federation doesn't constrain training; it constrains the wire format.
- Export weights as GGUF or the FractalMoE format used by the bets harness. The wire format is what the federation reads; the training pipeline is your choice.
- Write a manifest with model id, vocab size, hidden dim, layer count, quantisation tag, and a content hash. The manifest is what the federation's directory reads. Schema: see RFC-0006 Section 4 in the spec.
- Submit the manifest to the directory — a coordinator entry plus a gossip announcement (Bet 15 validates gossip-based directory propagation).
- Add a per-user adapter slot if your specialist supports per-user customisation. The federation default per-user adapter is 9 KB norm-only — see Bet 49.
What a contributed specialist does for the federation:
- Adds a routing target that primaries can dispatch to. The mixture combiner (Bet 04) ensembles outputs from multiple specialists; more specialists = richer mixture = better outputs for queries that benefit from specialist knowledge.
- Provides a domain expansion path. The base model is general-purpose; specialists fill in domain-specific knowledge that a single base can't cover.
- Generates audit-trail richness. The glass-box LLM (Bet 18) attributes per-token contributions to specific specialists; more specialists = more detailed attribution.
Recommended starter specialists (gaps in the current federation):
- Code-completion specialists for languages the base model handles weakly (Rust, Zig, Haskell).
- Domain specialists for technical fields (legal, medical, financial) — note the regulated-deployment gates.
- Language specialists for under-represented languages.
Propose a bet
New research questions are welcome. Anything the catalogue hasn't measured that's load-bearing for the federation's deployment thesis is a candidate bet. The format is one file in experiments/bets/NN_short_name.py where NN is the next free number. The module docstring declares the hypothesis, criteria, and run command. Reuse experiments/bets/_common.py for registry setup, specialist loading, and result writing. Write the result to experiments/bets/results/NN_*.json, then run 00_rollup.py to regenerate SUMMARY.md.
A representative module header:
"""
Bet NN — <one-line hypothesis>.
STRICT: <numeric threshold the result must clear>
LENIENT: <weaker threshold>
CATASTROPHIC: <result that would falsify the broader thesis>
Run: PYTHONPATH=src python -m experiments.bets.NN_<name>
"""
The methodology requires:
- Pre-registered criteria. Strict / lenient / catastrophic thresholds, written before the experiment runs.
- JSON result file. Mechanical pass/fail computation from the criteria.
- Reproducibility. The bet's run command and any required fixtures must be documented; another contributor must be able to reproduce the result without asking the original author.
- Negative controls (when applicable). Any positive personalization or fine-tuning claim should be paired with a noise-floor or random-input control. The Bet 60 / 61 pattern.
What makes a good bet:
- Falsifiable. A bet whose hypothesis is "the federation works" doesn't fail any reasonable measurement; rephrase as "X primitive achieves Y under Z conditions" with measurable Y.
- Decision-relevant. A bet whose outcome doesn't change the federation's design or deployment is research curiosity, not catalogue load-bearing. The catalogue's discipline is to focus bets on questions that move the deployment story.
- Affordable. A bet that requires weeks of compute is hard to replicate. Most catalogue bets run in seconds to minutes; a bet that takes hours is ok if the result is consequential. A bet that takes days is probably not the right shape.
What kinds of bets the catalogue currently lacks:
- Real-WAN measurements. All current bets are LAN. Anything that measures cross-ISP federation behaviour is high-leverage.
- 1B+ scale validations. Most current bets are 30M. Re-running key bets at 1B+ on real-user data would close the most consequential open question.
- Long-horizon training. Most current bets train for minutes. Federated training over hours/days has different dynamics; the catalogue has limited evidence here.
- Adversarial scenarios. Subtle byzantine, coordinated adversaries, data-poisoning attacks. The catalogue has Bet 44 (coarse byzantine); subtle adversarial scenarios are open.
Submit a hardened replacement
The most valuable contributions are the unglamorous ones — replacing a flimsy bet with a stricter version. If a bet relies on a single seed, runs without a negative control, or reads an overfit final-step number as a victory, write the disambiguating follow-up.
Recent examples that became foundational:
-
Bet 60 ran the negative control (random-token training) that should have run alongside Bet 37 from the start. Result: most of Bet 37's headline ppl drop was regularisation, not personalisation. The catalogue's per-user adapter framing tightened to match.
-
Bet 61 built the personalization-vs-regularization confusion matrix. Train one adapter per user, evaluate every adapter on every user's held-out text. The diagonal advantage (5–29% own-adapter wins) is the actual personalisation signal. Without Bet 61, the federation's per-user routing decision would lack empirical grounding.
-
Bet 62 retracted Bet 50's headline. Bet 50 said K=100 DiLoCo outperforms K=1 by 24%. Bet 62 ran with multiple seeds and early stopping; K=1 actually wins by 5–15%. The catalogue's training default changed; the methodology tightened (multi-seed and early-stopping became required for optimiser comparisons).
Submitting a falsification of an existing claim is treated as a success here, not a defeat. The retraction itself is evidence the methodology is real. A catalogue that never retracts is either lucky or dishonest; a catalogue that retracts cleanly is calibrated.
The pattern for hardening a bet:
- Identify the methodological weakness. Single seed? Missing control? Wrong evaluation set? Each of these is a known failure mode the catalogue has encountered.
- Run the disambiguating experiment. Not the original bet again — a new experiment that disambiguates between "the original claim is right" and "the original claim is an artifact of the methodology."
- Submit as a new bet. With its own number, its own pre-registered criteria, and its own JSON result.
- Update the original bet's writeup. Add a redirect to the new bet and reframe the original claim as appropriate.
- Update the methodology if needed. If the failure mode is one the catalogue should always check for, add scaffolding to
_common.pyor update the methodology document.
This is the highest-leverage research contribution to the catalogue. A wrong claim corrected is more valuable than ten new claims added — calibration compounds over time, marketing doesn't.
Other contribution paths
Several non-research contributions are also load-bearing:
-
Documentation improvements. The catalogue's writeups can always be clearer. Bet entries that are dense or jargon-heavy benefit from rewrites for accessibility. The "what we don't claim" sections in particular benefit from sharp framing.
-
Tooling for replication. A bet's value depends on someone being able to re-run it. Tooling that makes the bets-harness easier to install, faster to set up, more portable across hardware is high-leverage.
-
Visualisation. The catalogue's results are mostly numbers. Visualisations of confusion matrices (Bet 61), perplexity trajectories (Bet 50/62), audit-trail attributions (Bet 18) make the catalogue more legible to non-specialist readers.
-
Institutional outreach. The Kerala IT@School pilot (open question) needs warm intros, plain-language whitepapers, and partnership-building work. This is unglamorous and currently the highest-leverage time spend on the project. Contributors with relevant networks (Indian edtech, Kerala-specific institutions, IndiaAI Compute Portal) can move this forward in ways the technical work cannot.
-
Privacy and consent infrastructure. Per-user adapters operate on user-personal text; the federation's privacy story needs careful UX, clear data-flow documentation, and audit affordances the user can actually use. Engineering this layer is not a research bet; it's deployment infrastructure that the federation needs before any real-world deployment.
Where to find us
- Code: the SharedLLM monorepo (link will be updated when the public repo opens).
- Bets index:
experiments/bets/SUMMARY.md. The roll-up of every bet's hypothesis, criteria, result, and run command. - Inter-machine messaging:
agent-relay(Rust CLI) for contributor coordination across machines. SeeCLAUDE.mdfor setup. - Open questions: real-WAN federation throughput, 1B+ scale personalization, on-device phone validation, Kerala IT@School pilot deployment. Each has its own catalogue entry with the gates that would close it.
Why contributing matters
The federation's deployment thesis depends on community participation. The technical work to date — 63 bets, calibrated claims, a coherent methodology — is the foundation. The deployment story requires the foundation plus a network of contributors operating nodes, training specialists, running pilots, and doing institutional work. Neither half can deliver the federation alone.
Contributing is not a favour to the project; it's how the project becomes real. A federation with one contributor is a research artifact; a federation with hundreds is a deployable system. The methodology is designed to make contribution cheap (clear conventions, mechanical evaluation, public retraction-friendly catalogue) so the marginal contributor can add value without coordinating with the original authors.
The catalogue's discipline scales with contribution count. More bets = more evidence; more falsifications = more calibration; more nodes = more operational data. Every contribution moves the catalogue toward what the federation needs to be: a deployable, transparent, community-owned alternative to centralised LLM services. That's the goal. The catalogue is the path.