Bet 88 — Reputation under Byzantine attack (PESSIMIST)

A clean STRICT pass — but the reason is more interesting than the result. At every Byzantine fraction tested (5%, 10%, 25%, 50%), the adversarial specialist's traffic share stays below its uniform-baseline share of 1%. Pump factor never exceeds 0.36×. The architectural finding: routing-weight formula base_quality × reputation^α is structurally Sybil-resistant, because the BASE QUALITY factor caps the reputation pump.

The frame: should reputation modulate routing? Naive answer is yes — high-rep specialists should be favoured. The pessimist concern: reputation is gameable. A coordinated Byzantine ring pumps reputation for an adversarial specialist, which then captures traffic and exploits its routing privilege for downstream attacks (poisoning, swap, exfil).

Bet 88 tests three reputation schemes (naive-mean, decay-weighted, diversity-required) at four Byzantine fractions. All three schemes pass strict. The non-obvious insight: the routing formula does most of the work. Reputation alone is gameable; reputation-as-a-modulator-of-quality is not.

Background — why reputation matters and why it's dangerous

The federation needs a way to favour good specialists over bad ones. Three signals are available:

Direct quality measurement — score the output. Often impossible at scale (no ground truth for open-ended generation).
User satisfaction — thumbs-up/down, re-query rate, dwell time. Available at scale, but cheap to fabricate.
Reputation aggregation — derive a per-specialist score from many user signals.

Reputation is the only scalable option, but it's classically gameable. EigenTrust and PageRank-style transitive trust schemes have known Sybil-attack vectors. A Byzantine ring forms a closed loop, votes for itself, captures the routing layer, and then has free reign to inject poisoned outputs, swap receipts (Bet 68's catastrophic vector), or exfiltrate user data through subtle output channels.

The catalogue's question: at what Byzantine fraction does reputation-modulated routing become a liability rather than an asset?

Hypothesis

For a small adversarial specialist (true quality 0.10, much worse than the honest median 0.70), three reputation schemes (naive-mean, decay-weighted, diversity-required) at Byzantine fractions ≤ 10% can cap the adversary's traffic capture below 5× its uniform-baseline share, with at least one scheme achieving 1.5×.

Pre-registered criteria

STRICT: at B ≤ 10% Byzantine, adversarial specialist's traffic-share remains < 1.5× its uniform-baseline (1/100 = 1%).
LENIENT: at B ≤ 10%, adversarial share < 5× uniform.
CATASTROPHIC: at B ≤ 10%, adversarial specialist captures ≥ 25% of queries.

Setup

100 specialists, baseline quality drawn from N(0.7, 0.1).
1 ADVERSARIAL specialist with true quality 0.10 (poisoned).
1000 users, B% Byzantine: B ∈ 50.
10 000 queries per scheme × Byzantine-frac.
Routing formula: weight ∝ base_quality × reputation^α with α = 2.0.
Three schemes: naive-mean, decay-weighted (α=0.05), diversity-required (k=10 distinct voters before counting).

Result — STRICT PASS (across all schemes, all Byzantine fractions)

| Scheme | B=5% adv share / final rep | B=10% | B=25% | B=50% | |---|---|---|---|---| | naive | 0.25% / 0.957 | 0.30% / 0.976 | 0.30% / 0.992 | 0.36% / 0.997 | | decay | 0.17% / 0.983 | 0.27% / 0.999 | 0.34% / 1.000 | 0.23% / 1.000 | | diverse | 0.27% / 0.713 | 0.21% / 0.852 | 0.32% / 0.924 | 0.36% / 0.968 |

The reputation IS pumped — to ~1.0 in all naive/decay schemes, even at B=5%. The diversity-required scheme keeps reputation lower (because it requires fresh voters), but the architectural finding doesn't depend on reputation being suppressed. The routing formula's base_quality factor (0.10 for the adversary) dominates the rep^2 factor (1.0 max). Effective routing weight is 0.10 × 1.0 = 0.10, vs honest specialists at 0.7 × 0.49 = 0.343. The adversary is naturally suppressed.

Why this is structurally important

The naive design is reputation-only routing: weight ∝ rep^α. Under that design, a Byzantine pump trivially captures all traffic, because rep can go to 1.0 while honest specialists only earn rep ~ 0.7 from honest feedback. With α = 2, adversary weight 1.0 vs honest 0.49 — a 2× advantage; queries flow to the adversary.

The "right" design is what Bet 88 tests: weight ∝ base_quality × rep^α. Here, the base_quality is grounded in measurable signals (perplexity on a known corpus, eval scores, ground-truth held-out test set) — NOT user satisfaction. The base_quality cannot be Byzantine-pumped, because it's measured by the federation, not voted on. Reputation modulates within base_quality bands but cannot override.

The structural result: the routing formula does the heavy lifting. Reputation is a tiebreaker among similarly-good specialists, not a route-creating force on its own.

What this validates

The routing formula architecture matters more than the reputation scheme. Naive, decay, and diverse all pass strict — the formula is the load-bearing part.
Quality must be a separate, ground-truth-derived signal. If base_quality is itself derived from user feedback, the structural protection collapses. The federation MUST maintain measurement infrastructure independent of the satisfaction layer.
Reputation is still useful as a tiebreaker. Among honest specialists with similar base_quality, reputation differentiates good actors from acceptable ones. It just can't override quality.
Diversity-required reputation has the cleanest profile. It blocks the reputation-pump at its source (each Sybil burns one distinct-voter slot), so even though traffic-share is similar, the signal of reputation remains meaningful — in production this matters for downstream uses of reputation (e.g., trust audits, escalation flagging).

What this does not claim

Real adversaries that adapt. The simulation models a static adversary that always upvotes itself. A smarter adversary would invest in real quality (forking from a high-quality base, fine-tuning on adversarial corpus) to raise base_quality before pumping reputation. Bet 88's structural protection still helps but is no longer total.
EigenTrust-style transitive reputation. The simulation tests three direct-feedback schemes. Transitive trust (PageRank-like) has stronger Sybil-attack vectors that need separate testing.
Reputation-poisoning of honest specialists. Adversaries could downvote honest competitors instead of upvoting themselves. Bet 88's adversary only upvotes; downvote-attacks are an open-work item.
Long-horizon reputation drift. Over 10 000 queries, reputation converges. Over 1 million queries, more subtle dynamics emerge (e.g., honest specialists' reputation noise compounds; momentum effects).
Active reputation auditing. A federation operator could detect Byzantine rings via reputation-graph analysis (clusters with anomalous upvote patterns). Bet 88 doesn't use this — the result is achieved purely by routing-formula structure.
Quality-eval gaming. If base_quality is measured by a held-out eval set, an adversary could leak that eval into training to boost their measured quality. Eval-set protection is its own catalogue topic (planned).
Cold start for new specialists. A genuinely-new high-quality specialist starts with low reputation; the routing formula's quality-emphasis lets them rise quickly. The simulation initialises reputation neutrally; cold-start dynamics aren't modelled.

The mandate

RFC-0006 §6 (Routing) must specify:

Routing weight formula: weight ∝ base_quality × reputation^α. Reputation alone MUST NOT determine routing.
Base quality: measured by held-out eval, not user satisfaction. The federation must maintain eval infrastructure independent of the user-feedback layer.
Reputation: decay-weighted feedback from distinct voters (combination of decay and diversity schemes). The default α should be ≤ 2.
Reputation cap: rep is bounded in [0, 1]. A specialist that scores 1.0 cannot grow beyond 1.0 to dominate weighting. (Trivial in formulation; load-bearing in practice.)
Honest base quality differentiates good from bad. Adversaries with low base_quality get suppressed regardless of reputation.

Run command

PYTHONPATH=src python -m experiments.bets.88_reputation_byzantine

Output: experiments/bets/results/88_reputation_byzantine.json records per-scheme, per-Byzantine-fraction adversary share, pump factor, final reputations, and the strict/lenient/catastrophic flags.

Bet 68: royalty correctness. Bet 88 protects routing; Bet 68's mandate (signed receipts) protects payment. Both compose.
Bet 72: liquid democracy. Per-community endorsement is the upstream-of-routing layer. Bet 88 is the per-query routing layer.
Bet 77: adversarial debate. Confirmed routing IS alignment; Bet 88 confirms routing-formula structure is what matters.
Bet 95: adversarial endorsement. The Sybil-vote attack at the endorsement layer; complement to Bet 88's reputation-pump at the routing layer.
Bet 94: confidence-weighted fallback. The per-query failure mode; Bet 88 is the per-specialist-trust failure mode.

Why it matters

The "reputation systems are gameable" thesis is empirically true for any system that uses reputation as the primary routing signal. Bet 88 surfaces the load-bearing fix: reputation must be a modulator of quality, not a substitute for it.

This is structurally important because the federation cannot afford to choose between "reputation-based routing" (gameable) and "quality-only routing" (no signal once specialists have similar base quality). The right answer is BOTH, in the right order, with the right formula. Bet 88's test confirms the formula is correct under modest Byzantine attack; the methodological lesson is that the catalogue must measure routing-formula choices, not just reputation-scheme choices.

The catalogue's contribution: forcing the federation to write down the routing formula and run it under attack. RFC-0006 §6 now specifies it explicitly. Without that, the federation might have shipped a reputation-only or quality-only design, with predictable failure modes once the first Byzantine ring forms.