Real-WAN federation throughput

This is the most consequential open question in the catalogue. Every one of the 63 bets that have run used either an in-process FastAPI test client or a LAN — the same building's ethernet, sub-millisecond latency, no NAT traversal, no ISP shaping, no asymmetric reachability. The federation's protocol-level correctness has been thoroughly validated; the federation's operational correctness under real wide-area network conditions has not been measured at all. Until it has, every claim about deployment behaviour is extrapolation from LAN data.

The gap matters because the federation's deployment thesis depends on real-WAN behaviour. The promise is "community-owned distributed inference across consumer hardware on consumer internet." Consumer internet is not LAN. If the federation's protocol assumptions break under realistic WAN conditions — variable latency, bursty packet loss, ISP path changes, NAT timeout, asymmetric upload/download — the deployment story collapses. If the protocol holds with reasonable degradation, the story holds and the federation just needs WAN-aware tuning. The bets harness cannot answer which it is.

What's been validated on LAN

The catalogue's protocol-level claims have all passed at the LAN scale:

  • Protocol correctness. All 63 bets pass. Wire format, mixture combiner, KV-cache handoff, layer-forward routing, audit trail — all measured and working.
  • Worker throttle tolerance (Bet 45). The scheduler produces bit-exact tokens regardless of per-worker latency profile. Validated on 100 inference runs across no-throttle, steady-throttle, and ramp-throttle conditions.
  • Byzantine actor tolerance (Bet 44). Coordinate-wise median aggregation absorbs 1/8 byzantine fraction at 1000× scaled gradients while keeping final loss within 1.084× of the byzantine-free baseline.
  • Numerical robustness (Bet 63). Hidden-state Gaussian noise up to σ_rel = 1e-1 doesn't destabilise personalisation. The federation has 4 orders of magnitude of headroom against fp16 round-off.
  • Bounded audit overhead (Bet 17). 0.6% inference-time cost for the deferred-extraction audit pattern. Audit-on-by-default is operationally cheap.

These are real results. They tell us the federation is internally consistent and well-implemented. They do not tell us what happens when the federation crosses real ISP boundaries.

What's not yet measured

The list of unmeasured WAN behaviours is long, and each item is plausibly load-bearing for deployment:

  • Token-throughput across continents. A federation with workers in Europe and primaries in Asia will run into long-haul latency (100–300 ms RTT) that LAN tests don't expose. Pipeline depth (how many sequential layer-forwards happen before the first token) interacts with latency multiplicatively. A 4-layer pipeline at 200 ms RTT per layer-forward = 800 ms per token before the first sample. Whether this is tolerable depends on the use case (offline batch generation: yes; interactive chat: no).
  • Jitter under residential ISP conditions. Consumer broadband has bursty packet loss (typical: 0.1-1% sustained, 5-20% during congestion events), variable latency (typical RTT can swing by 50-200 ms over minutes), and NAT timeout effects (a connection that's idle for 60+ seconds may need re-establishment). The protocol handles these via retransmission and reconnection, but the throughput impact under realistic loss profiles is unmeasured.
  • Throughput-vs-bandwidth-cost curve. The K-value tradeoff in DiLoCo (Bet 62) is bandwidth-vs-loss; the relevant operating point depends on what bandwidth actually costs at deployment scale. A K=1 federation at 1B base + 1 MB/sec per worker × 10,000 workers = 10 GB/sec aggregate bandwidth. That's not just expensive — it's network-architecture-bending. K=10 or K=100 cuts this proportionally; the right K depends on the achievable per-worker bandwidth in real deployment.
  • Path selection under multi-endpoint transport. RFC-0001 (the multi-endpoint transport spec) allows nodes to advertise multiple reachability candidates (LAN address, WAN address, relay address). The primary selects the best path with LAN preference. This is correct in design but unmeasured at scale; what fraction of federation pairs can establish direct connections vs needing relay, and what the cost differential is, is open.
  • NAT traversal at scale. Most consumer connections are behind NAT. The federation needs hole-punching or relay support; both work in small-scale tests but haven't been stressed under realistic NAT-pair distributions.
  • ISP shaping and traffic class. Some ISPs deprioritize sustained outbound traffic from residential connections. A federation that runs continuously may find its bandwidth throttled at the ISP layer, independent of the home connection's nominal capacity. Unmeasured.

Why the bets harness can't reach this

The bets harness is a single-process Python test rig that simulates the federation. It cannot measure cross-ISP throughput because it doesn't cross an ISP. The harness can simulate latency (artificial sleeps), simulate packet loss (random message drops), simulate jitter (variable sleeps) — but simulation doesn't catch the interaction between the simulated condition and the real protocol stack. Real TCP/IP retransmits behave differently from simulated drops; real BGP-routed paths behave differently from localhost; real ISP shaping behaves differently from artificial throughput caps.

To make progress, the federation needs:

  1. At least three nodes on three different residential ISPs. Not lab ISPs, not university backbones — actual home internet from different providers in different physical locations.
  2. A coordinator on a stable VPS or one of the home machines configured as such. Coordinator availability is a precondition for measuring federation behaviour; if the coordinator itself is unreliable, the measurements conflate coordinator and federation issues.
  3. A long-running deployment (days, not minutes). Short tests don't catch ISP-level shaping (which often kicks in after sustained traffic), don't catch NAT timeout effects (which need an idle period), don't catch the impact of typical home-network events (someone else in the household streaming, ISP maintenance windows, etc.).
  4. Instrumentation. Per-token latency distribution, per-message loss rate, retransmission count, end-to-end goodput, path selection events (LAN vs WAN vs relay), connection re-establishment count. Without this, the test produces a "yes/no it worked" answer rather than a quantitative deployment-readiness measure.

What we'd learn

The deployment story depends on real-WAN behaviour. There are roughly three possible outcomes from running this experiment:

  • Catastrophic. The federation collapses to "single-host bound by uplink" under WAN conditions — bandwidth contention, latency multiplication, or shaping makes the pipeline unable to process tokens at any reasonable rate. The community-owned thesis would be in trouble; the federation would only be useful for batch / offline / non-interactive workloads.
  • Degraded but viable. The federation runs with 2–3× slower throughput at the same K-value and adapter format. The protocol holds; the deployment story holds with the explicit qualifier that real-WAN deployment is slower than LAN. The K-value and adapter strategy might shift toward more bandwidth-conservative defaults.
  • Surprisingly OK. The federation is mostly bottlenecked by per-token compute rather than network, and WAN conditions barely register. The deployment story is unchanged. Unlikely but possible — especially for the per-user adapter scenario, where the network traffic is small (KB-scale per inference) compared to the compute work.

Until we run the experiment, we don't know which of these is true. The catalogue's discipline is to be explicit about this gap rather than to assume LAN behaviour generalises.

What's needed to start

The infrastructure is partly ready. The coordinator (on Machine P, <MACHINE-P-IP>:8420), the primary (Machine P, <MACHINE-P-IP>:8422), and the worker (Machine W, <MACHINE-W-IP>:50052) form a 2-node LAN today. They're already running production deployments of the SharedLLM stack with real models (not bets-harness fixtures).

The minimum extension to start measuring real-WAN:

  • A third machine on a separate residential ISP. A friend or collaborator with a home internet connection running a node-daemon. Could be any modern laptop. Needs to register with the coordinator and join the federation.
  • A measurement harness. A long-running script that drives a representative inference workload through the federation and records per-token latency, network-level metrics, and federation events. The bets harness has the inference-driving infrastructure; the network-instrumentation piece needs to be added.
  • A defined "representative workload." What does "running the federation in production" actually mean? Probably: continuous inference on a varied prompt mix at modest QPS (not stress-testing, just realistic). Defining this precisely is part of the open work.

The work is unglamorous. Wiring up real-WAN testing is not a research bet — it's deployment engineering. But it gates the most consequential claim in the project (federation actually federates), and no amount of additional bets at the protocol level substitute for it. A reasonable rule of thumb: the next 5 bets at the protocol level deliver less marginal value than the first real-WAN measurement.

What this question's resolution would unlock

If real-WAN throughput is acceptable, the entire deployment chapter activates:

  • Kerala IT@School pilot (open question) becomes technically grounded rather than aspirational.
  • Phone-class deployment (open question) has its first technical precondition cleared. Phones live on residential ISPs; if residential ISPs work for the federation, phones might.
  • 1B+ scale personalisation (open question) becomes deployable rather than just trainable. A 1B base + per-user adapters is only useful if users can actually access the federation across a real network.
  • The community-ownership thesis moves from "in principle, with the right network" to "in measured practice, with these specific characteristics."

If real-WAN throughput is catastrophic, the federation pivots:

  • Smaller-scale federation. Local clusters (a household, a small office, a school) rather than internet-wide. Still useful — the federation's per-user adapter and audit story still apply at this scale — but the geographic distribution thesis weakens.
  • Asynchronous-batch federation. Inferences are queued, processed when bandwidth is available, returned later. Not interactive but possibly useful for batch workloads (training data generation, analytics, summarisation).
  • Hybrid centralised/federated. Use the federation for training and audit-relevant inference; use a centralised endpoint for interactive workloads. This dilutes the community-ownership thesis but keeps the parts that the federation actually delivers.

The catalogue can't pre-decide which path the federation takes; the data has to come first.

How this connects to other open questions

Real-WAN throughput is the gate for the Kerala IT@School pilot deployment (open question). A pilot that doesn't measure WAN throughput is a pilot that's hoping LAN behaviour generalises. The bets harness can't make that hope come true; only real measurement can.

Real-WAN throughput is also a precondition for phone-class deployment (open question) and 1B+ scale personalisation (open question). All three open questions sit downstream of "the federation actually works on consumer internet"; without that data point, each is theoretical.

The catalogue's job is to make this dependency chain explicit. The federation's technical work has piled up impressive results at the protocol level; the operational level — the level that actually matters for the deployment thesis — is mostly unmeasured. Acknowledging this gap is what the open-questions chapter is for. The honest version of the federation's status is "protocol-level validated across 63 bets; operational-level untested." The next phase of work is to turn that "untested" into a number.

Why this stays in the open-questions chapter

The bets harness produces clean STRICT-PASS results because the harness itself is well-controlled. Real-WAN measurement is messy — network conditions vary, ISP behaviour changes over time, instrumentation is harder. The work doesn't fit the bets-harness pattern of "pre-register criteria, run experiment, report STRICT/LENIENT/CATASTROPHIC."

That's a reason to do it differently, not a reason to skip it. The deployment story can't be built from clean lab measurements alone. The open-questions chapter is the catalogue's commitment to keep the messy questions visible even when they don't fit the harness's tidy structure.

Until real-WAN measurement happens, the catalogue's strongest claim is "the federation is protocol-level correct on LAN." That's a calibrated claim, not a marketing claim. The federation deployment story is built on it — and on the explicit acknowledgment that the WAN extension is open work that hasn't started.