Bet 69 — Generational specialist inheritance (PESSIMIST)
The first bet in the cognitive-estate frame. Strict pass on the headline criterion, but only because the criterion is graded across overlap regimes — the orthogonal-child case (the hard test) is borderline-harmful, exactly as the lineage of Bets 31, 33, and 54 predicted.
The frame: the federation pays trainers royalties indefinitely, but trainers are mortal. A trainer's per-user adapter (their "estate") should be inheritable by a successor — a child, an apprentice, a community follower — without that successor having to retrain from scratch on the original corpus. Bet 69 measures whether any weight-space inheritance scheme produces a meaningful merge: a successor adapter that retains parent-domain skill while acquiring successor-domain skill.
The pessimist hypothesis going in: every weight-space scheme breaks. Bet 31 (model soup) and Bet 33 (task-vector extrapolation) and Bet 54 (cross-specialist averaging) all came back below useful thresholds when the domain mismatch is non-trivial. We expected Bet 69 to repeat that finding, with concatenation as the only honest answer (at 2× memory cost).
The result is more nuanced. Inheritance works cleanly when parent and child target overlap by ≥ 50%. It is borderline-harmful when their targets are orthogonal. The catalogue's lesson: cognitive estate is a real primitive, but the federation must be honest with users about which inheritance regime they're in.
Background — what cognitive estate means
The disruptive frame (see Operating-layer big bets) introduced cognitive estate as one of the federation's potentially-load-bearing primitives. The intuition: if I train a per-user adapter on my private corpus over decades, that adapter is a piece of intellectual property — it captures my style, my domain expertise, my idiosyncratic preferences. If I die, that adapter shouldn't simply vanish; nor should it require my successor to retrain from zero on a corpus the successor may not have access to.
Concretely: a parent adapter starts life on parent corpus and converges to a per-user target vector (in adapter-weight space). Their child has their own corpus and target. The question Bet 69 asks: can we initialise the child adapter from the parent adapter and have the child's training trajectory land somewhere that retains parent-domain skill while acquiring child-domain skill?
The honest framing: this is the same problem Bet 31 (model soup) tackled at the specialist level, just with one specific ordering — the parent adapter is the initialisation, not just one of two equal-weight terms. The pessimist hypothesis was that this ordering still doesn't help: weight-space averaging is unstable when the targets diverge.
Hypothesis
For at least one weight-space inheritance scheme, the merged adapter retains ≥ 80% of pure-parent quality on the parent's domain and ≥ 80% of pure-child quality on the child's domain — across a sweep of parent-child target overlaps from 0.0 (orthogonal) to 0.9 (highly aligned).
Pre-registered criteria
- STRICT: at least one scheme achieves both
parent_q / pure_parent_q ≥ 0.80andchild_q / pure_child_q ≥ 0.80for at least 2 of 3 overlap regimes (0.0, 0.5, 0.9). - LENIENT: at least one scheme achieves both ratios ≥ 0.60 for at least 2 of 3 overlap regimes.
- CATASTROPHIC: every scheme makes both worse than pure-child-trained-from-scratch (i.e., inheritance is actively harmful, not neutral).
The catastrophic threshold is the dangerous one. If inheritance is worse than starting fresh, the federation should refuse to offer inheritance at all — it would be a mis-feature.
Setup
- Adapter dimension: D = 256.
- Parent corpus: 200 SGD steps toward parent target (with Gaussian gradient noise).
- Child corpus: 50 SGD steps toward child target (the child has dramatically less training data than the parent — pessimist).
- Reference: big-child trained for the same 200 steps as the parent, on child target. This is the upper-bound: what the child would learn if they had full data.
- Five inheritance schemes:
- Linear average:
0.5·parent + 0.5·child. - Weighted blend at λ ∈ 0.7:
λ·parent + (1−λ)·child. - Task-vector extrapolation at α ∈ 1.5:
parent + α·(child − parent). - Distillation: start from parent, train on child target for 50 steps.
- Concatenation: retain both adapters, route per-domain. (Reported as 2× memory baseline.)
- Linear average:
- Overlap sweep: child target is
overlap·parent + (1−overlap)·orthogonal, normalised. Tested at overlap ∈ 0.9. - Quality metric: cosine similarity between adapter and target — proxy for held-out perplexity.
- Seeds: 5 per (overlap, scheme) cell.
Result — STRICT pass, with a critical caveat
| overlap | best scheme | parent_ratio | child_ratio | regime | |---|---|---|---|---| | 0.0 | linear_average | 0.71 | 0.70 | orthogonal — fails strict (≥0.80) | | 0.5 | linear_average | 0.99 | 1.02 | strict pass | | 0.9 | linear_average | 1.10 | 1.04 | strict pass; merge regularises |
The strict criterion ("≥ 2 of 3 overlap regimes pass") fires, because overlap=0.5 and overlap=0.9 both pass cleanly. But the orthogonal regime — where the parent and child targets are uncorrelated — does not pass strict, and it does not fully pass lenient on the child axis either (0.70 < 0.60 lenient is satisfied; barely).
The lenient criterion passes. The catastrophic criterion does not fire. Inheritance is not harmful in any regime — at the orthogonal regime, the merged adapter retains roughly 70% of both pure-parent and pure-child quality, which is genuine information transfer, just not the 80% the strict bar requires.
| Scheme | overlap=0.0 (parent / child) | overlap=0.5 | overlap=0.9 | |---|---|---|---| | linear_average | 0.71 / 0.70 | 0.99 / 1.02 | 1.10 / 1.04 | | blend_03 (heavy parent) | 0.99 / 0.40 | 1.10 / 0.96 | 1.13 / 1.02 | | blend_07 (heavy child) | 0.40 / 0.99 | 0.84 / 1.06 | 1.04 / 1.05 | | taskvec α=0.5 | 0.71 / 0.70 | 0.99 / 1.02 | 1.10 / 1.04 | | taskvec α=1.0 | 0.05 / 0.99 | 0.66 / 1.06 | 1.04 / 1.05 | | distillation | 0.13 / 0.97 | 0.85 / 1.05 | 1.07 / 1.05 |
(Each cell is the mean of 5 seeds.)
The patterns:
- Linear average is robust across overlap regimes. It under-performs only at orthogonal.
- Heavy-parent blend (λ=0.3 child weight) preserves parent skill at the cost of child skill. Useful if the user wants to "honour the predecessor" rather than serve their own corpus.
- Heavy-child blend (λ=0.7 child weight) does the opposite. Acts almost like pure-child-from-scratch — inheritance is barely useful.
- Distillation works well at high overlap, fails at orthogonal. Same lesson as Bet 21 (self-distillation): the teacher signal helps only when teacher and student domains agree.
The over-1.0 ratios at high overlap aren't bugs. The merge regularises — it is essentially a noise-averaged version of the child trajectory, and a noise-averaged trajectory in a tightly-aligned target space is better than the noisy single-trajectory baseline.
What this means for cognitive estate
The federation can offer estate inheritance, but the offer must be regime-aware:
- High-overlap inheritance (parent and child share ≥ 50% of target subspace): linear-average or λ=0.5 blend works cleanly. The successor inherits ~100% of parent skill and acquires their own corpus's skill on top.
- Orthogonal inheritance (truly different domains): the successor should be told that inheritance buys them ~70% of each domain at half the data cost. They can choose: accept the trade, or train from scratch.
- Concatenation: always available, at 2× memory cost. Should be the default for high-stakes inheritance (legal documents, medical history, sovereign-data corpora) where 70% retention is unacceptable.
The federation's UI should expose the overlap regime to the user before they accept inheritance. The bet's contribution: this is now a measured threshold, not an architectural assumption.
What this does not claim
- Real adapter dimensions. The bet uses D = 256 vectors and SGD-toward-target, not LoRA on actual transformer layers. The directions of the failure modes are validated; the magnitudes will shift on real data.
- Training-data realism. "Parent corpus" and "child corpus" are stylised as synthetic targets with Gaussian noise. The decoupling from real natural-language corpora is intentional (testing the merge primitive in isolation), but it elides corpus heterogeneity (e.g., parent's writing style vs child's writing style at sentence level).
- Multi-generation chains. Bet 69 covers parent → child. Generation chains (parent → child → grandchild) are open work; expect cumulative drift to make orthogonal inheritance unviable past 2 hops.
- Adversarial inheritance. A successor who deliberately injects noise to corrupt the inherited adapter is out of scope here. The royalty and audit primitives (Bet 14, Bet 64) handle that surface.
- Concatenation memory cost. The 2× baseline is honest, but real federations may run with much higher fan-out (5+ generations of estate). Memory growth becomes a concern; per-domain pruning is open work.
Run command
PYTHONPATH=src python -m experiments.bets.69_generational_inheritance
Output: experiments/bets/results/69_generational_inheritance.json — per-overlap, per-scheme parent_q_mean, child_q_mean, parent_ratio, child_ratio plus the strict/lenient/catastrophic flags.
Related entries
- Bet 31: model-soup weight averaging across specialists. Bet 69 is the same merge problem with a parent-as-init ordering.
- Bet 33: task-vector extrapolation soup. One of the schemes Bet 69 sweeps directly.
- Bet 21 / 35: self-distillation chain. Distillation is one of Bet 69's schemes; its overlap-sensitivity tracks Bet 21's findings.
- Bet 54: cross-specialist averaging at scale. Bet 69 is the same problem with a smaller-N, parent-child framing.
- Bet 14 / 70: royalty ledger and 50-year royalty stream. The economic substrate for which estate inheritance is the cognitive-side primitive.
Why it matters
The federation's most distinctive disruptive promise — beyond what Petals or Bittensor offer — is persistent cognitive identity across generations. A specialist trained today should still serve users in 50 years, even after the original trainer is gone. Bet 69 tests whether the technical primitive supports that promise.
The honest read: partially. In the regime where the successor's domain is similar to the predecessor's (most realistic family / apprentice / co-author scenarios), inheritance is clean and the federation can ship it as a default. In the regime where domains are orthogonal — the rarest case — inheritance buys 70% retention at half the data, which is still genuine value, just not the strict bar.
The methodological lesson: a regime-graded result is more honest than a single pass/fail headline. The strict bar firing across 2 of 3 regimes is the correct answer; the orthogonal failure is the boundary the federation must surface to its users. The catalogue's discipline forces this distinction; a less-rigorous evaluation would have averaged the regimes and reported a single misleading number.