Addiction sensitisation dynamics — modelling the consolidated gain §36 named out of reach, directly on the plasticity layer (the B‑ii half of the convergence, closed)

The integrated sensitisation gain chapter 36 named out of reach is modelled here directly on the plasticity layer. Repeated reward exposure builds a retained trace, the sensitised circuit reacts more to cues, extinction does not erase it, and intermittent exposure sensitises more -- so the two convergence halves meet. Addiction is a treatable medical condition.

The addiction threshold-levers chapter applied the inherited lever frame to addiction and produced the second partial fit in the series. It reached the instantaneous reward drive across all three levers, but it found the dominant addiction fault — the consolidated sensitisation gain (the SG axis), the learned reward-circuit amplitude embodied in the ΔFosB / BDNF / CREB1 / ARC plasticity trace that makes addiction chronic and relapsing — doubly out of reach of any instantaneous lever: it is a gain, not a fold (the ADHD lesson — a lever that moves a fold has no handle on an amplitude), and it is a consolidated, learned variable, a plasticity (E0-layer) trace, not an instantaneous operating point. §36 named that axis with four real genes and graded it [F] NOT REACHED, honestly. That naming was one half of a convergence — the point where the threshold-leverisation route meets the plasticity-dynamics route. This chapter is the other half. It models the SG axis directly by importing the E0 plasticity connectome (the PlasticConnectome of §26 — not re-derived; reused) and driving it with a reward bias whose sign is grounded read-only in the engine itself: the dopamine reward-prediction-error signal (M5) raises a rewarded eddy's laid-down probability well above an unrewarded control (p ≈ 0.8985 versus 0.20 — reward potentiates), and the selection stage (M4) commits a winner. A reward-exposure epoch therefore maps to a positive reward-drive bias, and the E0 phase-correlation Hebbian update accumulates repeated exposure into a retained ‖W−W₀‖ trace — the integrated sensitisation gain as a structural quantity. Five pre-registered sign-only predictions all hold, and all survive an η sweep (the anti-tuning guard). A1 — incentive sensitisation: repeated reward exposure builds the trace monotonically (0 → 0.077 → 0.153 → 0.227 → 0.333 → 0.429 over 0–24 exposures), the thing a one-shot reward does not do. A2 — cue-reactivity: a connectome sensitised by exposure responds more to the same reward cue than a naive one (resting coordination already at or above the naive anchor), the substrate of cue-induced craving and relapse — only the response-exceeds-naive sign is asserted, the marginal cue gain is not claimed monotone. A3 — extinction does not erase: removing the reward (baseline-off epochs, plasticity running) leaves the trace above zero (≈ 0.428, here still consolidating) whereas with plasticity off it is exactly zero — extinction removes the drive (the reachable instantaneous axis) but not the learned trace (the out-of-reach axis): the convergence seam, exhibited dynamically. A4 — the plasticity-variable guard: with η = 0 the reward excursion reverts exactly when the bias is removed, W stays identical to the kernel, and the coordination returns to the frozen M9 anchor (R ≈ 0.38961) bit-for-bit — the integrated gain disappears without plasticity, proving the SG axis is a learned, consolidated variable, which is precisely why §36's instantaneous levers (drive or ionic) cannot reach it. A5 — the dynamics handle: spaced (intermittent) exposure consolidates a larger trace than massed (continuous) at equal total exposure (≈ 0.225 versus 0.115) — intermittent reinforcement sensitises more, the structural handle on the trace the instantaneous frame could not reach. So the two halves of the convergence meet in one disorder: the threshold frame (B‑i) names the gain unreachable for instant levers; the plasticity-dynamics frame (B‑ii) both exhibits the gain (A1–A3) and gives a handle on it (A5). No new mechanism, no new tuned constant; the engine is read-only; the E0 layer is imported, not re-derived. The retained ‖W−W₀‖ is the integrated sensitisation gain as a structural quantity, never the felt quality of craving, reward, or relapse (Axis-A firewall — consciousness_claim = 0); only the signs are asserted, and every magnitude (the rate η, the identity of the real plasticity rule) is [O]. efficacy = 0; not medical advice; no cure; no licence to acquire or use any substance; addiction is a chronic, relapsing medical condition, not a moral failing; the hard problem stays open.

What §36 named out of reach — and the convergence this chapter closes

The threshold-levers chapter did two things. It reached a genuine, broad surface — the instantaneous reward drive and excitability axis, across all three levers, where the established addiction pharmacology acts as directions on a lever. And it named, by gene, an axis it could not reach. That out-of-reach axis is the dominant one: the consolidated sensitisation gain — the learned reward-circuit amplitude that plasticity has written into the circuit, the trace that re-ignites craving months into abstinence, the discriminant that makes addiction chronic rather than acute. §36 named it with four real genes — the ΔFosB master switch FOSB, the neurotrophin BDNF, the transcription factor CREB1, and the immediate-early plasticity gene ARC — carried each with its own promoter read alongside, but graded [F] NOT REACHED for the lever frame. The reason was doubled. First, the SG axis is a gain, not a fold: an instantaneous lever moves an operating point, and no operating-point shift sets an amplitude — the lesson ADHD taught when its dominant gain axis (synthesis, release) came up out of reach. Second — and this is what made addiction's partiality deeper than ADHD's — the SG gain is consolidated and learned: it is a plasticity (E0-layer) variable, a memory the circuit hardened over time, not an instantaneous variable at all. ADHD's gain was at least instantaneous; addiction's gain is a trace.

That naming was honest, and it was deliberately only one half of a larger statement. §36 was careful to say what it was doing: it was marking the precise point of convergence where two routes through this whole framework meet. The threshold-leverisation route — the one the entire Part-II series has pursued, decomposing an operating point into instantaneous levers — reaches the instantaneous reward drive and stops at the edge of the learned trace. The plasticity-dynamics route — the one the E0 chapter opened, describing how a circuit consolidates a trace over time — is exactly where that learned trace lives. Addiction is the disorder where the two routes touch: its out-of-reach SG axis is the E0 plasticity layer. §36 named the trace out of reach for the levers; it did not model the trace. This chapter models it. It takes the dynamics route to the same axis the lever route could only name, and in doing so it closes the convergence: the gain the threshold frame declared unreachable is here exhibited directly, and the variable that actually moves it is supplied.

Reusing the E0 plasticity layer, not re-deriving it

The first discipline of this chapter is reuse. The consolidated trace lives in the plasticity dynamics, and the framework already has a plasticity dynamics: the E0 layer, whose PlasticConnectome takes the engine's micro-eddy field and lets the connections between eddies change with use through a phase-correlation Hebbian rule — eddies that coordinate in phase strengthen their coupling, and the accumulated change is the retained matrix distance ‖W−W₀‖ from the unlearned kernel. This module does not re-derive that rule, re-invent the coupling-versus-bias map, or build a second plasticity engine. It imports the E0 connectome wholesale — the same class, the same kernel W₀, the same coupling map — and applies it to a new input: a reward drive. That is the handover discipline the whole package runs under: when a layer already exists and is frozen, a new chapter uses it rather than duplicating it, so there is exactly one plasticity dynamics in the framework and every result that rests on it inherits its guarantees.

The consequence is that this chapter introduces no new plasticity machinery and no new tuned constant. The Hebbian rule is E0's; the coupling map k = κ / (1 − |b|) (raising coupling with a positive bias, capped at twice κ) is E0's; the kernel is E0's. What is new is only the interpretation of the input — that a reward-exposure epoch is a positive drive bias — and that interpretation is not free either: its sign is grounded in the engine, as the next section sets out. Everything downstream — the retained trace, its growth with exposure, its persistence under extinction, its sensitivity to spacing — is then a property of the E0 dynamics under a reward input, read out, never re-fit. The engine tree is re-emerged read-only and confirmed byte-unchanged (0fbf4988…), and the module registers as the fifteenth atlas citizen (ADD-T3a).

Grounding the reward sign read-only in the engine (dopamine reward-prediction error)

The one new modelling choice — that repeated reward exposure is a positive (excitatory) reward-drive bias on the plastic connectome — is not asserted by hand. Its sign is read out of the already-emerged engine. The engine's learned-field stage (M5) implements a dopamine reward-prediction error: when an eddy is rewarded, the probability the field lays down for that eddy rises well above the probability laid down for an unrewarded control. The numbers are explicit and reproducible: the rewarded target reaches p ≈ 0.8985 against the control's 0.20 — reward potentiates. The selection stage (M4) then commits a basal-ganglia winner. So the engine itself says, read-only, that reward raises the laid-down drive on the rewarded pathway; a reward-exposure epoch therefore maps to a positive bias b > 0, never a negative one, and never a magnitude pulled from the air. The direction of the reward input is forced by M5; only the magnitude (how large a bias, at what rate) is left open and graded [O].

This is the same grounding discipline the disorder chapters use throughout: a sign is taken from the engine or the cited biology, a magnitude is never fit to a target. The reward bias enters the E0 coupling map through the existing relation k = κ / (1 − |b|) — a positive bias raises the coupling toward its cap — and the phase-correlation Hebbian update then accumulates the resulting in-phase coordination into the retained ‖W−W₀‖. Nothing in the chain is tuned: the sign comes from M5's potentiation, the map and the rule come from E0, and the trace is whatever those produce. The integrated sensitisation gain is, concretely, that retained matrix distance — a structural quantity the dynamics writes, read out under a grounded reward input.

A1 — incentive sensitisation: repeated exposure builds the trace

The first prediction is the defining one, and it is what separates a sensitising input from an ordinary one. Driving the connectome with the positive reward bias over a rising number of exposures and reading the retained trace after each, the trace grows monotonically: at 0, 4, 8, 12, 18, 24 exposures the retained ‖W−W₀‖ is 0, 0.077, 0.153, 0.227, 0.333, 0.429. Each additional block of reward exposure leaves the circuit a little further from its unlearned kernel, and the distance never falls. This is incentive sensitisation in its structural form: repeated exposure trains the reward circuit into a progressively stronger learned state, and the gain accumulates. It is precisely the behaviour a one-shot reward does not show — a single exposure moves the operating point and, without consolidation, leaves nothing behind — and it is the behaviour §36 named out of reach for any instantaneous lever, because an instantaneous lever has no notion of accumulation over exposures at all.

The sign is what is asserted, not the curve. The direction — trace monotonically increasing in exposure — is grounded (reward potentiates, via M5) and it is the claim; the exact increments are [O], reproducible artifacts of the E0 rate, not fitted quantities. And the claim is checked against the anti-tuning guard: the monotone increase holds over the whole η sweep (η ∈ {0.03, 0.05, 0.08}), so it is a property of the dynamics under a reward input, not an artefact of one learning rate. The integrated sensitisation gain that §36 could only name is here a growing structural trace with a grounded direction.

A2 — cue-reactivity: the sensitised circuit reacts more to the same cue

The second prediction is the relapse substrate, and it is the clinically central one. After the circuit has been sensitised by reward exposure, present it with the same reward cue that a naive circuit would see, and measure the coordinated response. The sensitised connectome responds more: its cue-evoked coordination exceeds the naive circuit's, and — the structural reason — its resting coordination already sits at or above the naive anchor. The learned trace has moved the circuit into a higher-coordination basin, so the same cue evokes a larger coordinated response than it would in a circuit that had never been exposed. This is cue-reactivity: the mechanism by which a cue associated with past reward re-ignites a disproportionate response long after the reward itself is gone — the engine of cue-induced craving and relapse, expressed structurally as a sensitised circuit reacting more to an unchanged input.

The discipline here is to assert only the sign that is robust. The claim is the response-exceeds-naive comparison — the sensitised circuit's cue response is greater than the naive circuit's, holding across the η×depth sweep. The module explicitly does not claim that the marginal cue gain (the extra response the cue adds on top of the already-elevated resting coordination) is monotone, because that finer quantity is not sign-robust, and asserting it would over-reach. So cue-reactivity is recorded as what it reliably is — a sensitised circuit responds more to the same cue — and not as more than the dynamics support. The relapse substrate is exhibited; the magnitudes around it stay [O].

A3 — extinction does not erase: removing the drive leaves the trace

The third prediction is the convergence seam itself, made dynamic. Take a sensitised circuit and run extinction — remove the reward, present baseline-off epochs, and let plasticity keep running. If the sensitisation gain were merely an instantaneous drive, removing the drive would return the circuit to baseline. It does not. With plasticity active, the retained trace stays above zero — after a sensitisation trace of ≈ 0.227, the post-extinction trace reads ≈ 0.428, here even continuing to consolidate — whereas with plasticity switched off the trace is exactly zero. The contrast is the whole point: extinction removes the drive — the reachable, instantaneous axis the §36 levers operate on — but not the learned trace — the out-of-reach axis that lives in the plasticity. Cessation is not cure: stopping the reward removes what the instant levers could move and leaves untouched the consolidated memory they could not.

This is exactly the structure §36 predicted from the lever side and could only name. The threshold frame said the instantaneous reward drive is reachable and the consolidated gain is not; A3 shows that split dynamically — the drive comes off under extinction, the trace persists. The seam where the two routes meet is no longer a statement about reachability in the abstract; it is a measured persistence: the part that extinguishes and the part that does not, separated by whether plasticity is running. The persistence holds across the η sweep (the trace stays positive at every rate, ≈ 0.271, 0.428, 0.611), so it is a property of the learned trace, not of one rate. The direction is forced (extinction does not return the trace to zero); the magnitudes are [O].

A4 — the plasticity-variable guard: why the instant levers cannot reach it

The fourth prediction is the guard that proves the §36 verdict from the inside, and it is the module's invariance check. Set the learning rate to zero — plasticity off — and repeat the reward exposure. The reward bias still moves the circuit while it is applied (the reward epoch reads an elevated coordination, R ≈ 0.4005), but the moment the bias is removed the excursion reverts exactly: the coordination returns to the frozen M9 anchor (R ≈ 0.38961), and the connection matrix W is left identical to the kernel. The retained trace is exactly zero. The integrated sensitisation gain, in other words, disappears entirely without plasticity. The gain is not a property of the drive; it is a property of the learning. And because the gain exists only when the connections are allowed to change — only as a consolidated, learned variable — it is, by construction, something no instantaneous lever can reach: a drive lever or an ionic lever moves an operating point now, and an operating point shifted now reverts now, exactly as the η = 0 run shows. This is why §36's levers named the SG axis out of reach: the axis is not on the instantaneous surface at all.

The guard doubles as the module's byte-level invariance proof. With η = 0 the baseline epoch reproduces the frozen M9 anchor bit-for-bit, the engine tree re-emerges unchanged, and the M0–16 subtree is identical — so the whole sensitisation construction is a pure add-on: switch off the one new ingredient (plasticity under a reward bias) and the result collapses back onto the frozen engine with nothing left over. A new chapter that vanishes cleanly when its one new variable is zeroed is a chapter that has added a reading, not altered the engine.

A5 — the dynamics handle: spaced beats massed (intermittent reinforcement)

The fifth prediction is what the dynamics route can offer that the lever route could not: a handle on the trace. Hold the total reward exposure fixed and vary only its schedulemassed (continuous, all exposure together) versus spaced (intermittent, the same exposure broken into separated bouts). The spaced schedule consolidates a larger retained trace at equal total time-at-reward: ≈ 0.225 for spaced against ≈ 0.115 for massed. Intermittent reinforcement sensitises more. This is a real and well-known feature of addiction — intermittent, unpredictable reward is more sensitising than continuous — and here it falls out of the E0 spacing behaviour applied to the reward bias, with no new ingredient. Crucially, it is a property of how the exposure is distributed in time, which is exactly the kind of variable a dynamics has and an instantaneous lever does not: a lever sets a value now and has no notion of schedule at all. The threshold frame, operating only on the instantaneous axis, could not reach this; the plasticity-dynamics frame supplies it.

So A5 is the constructive complement to A4. A4 shows why the instantaneous levers cannot reach the SG axis (it is a learned variable that reverts without plasticity); A5 shows what can — a structural handle on the consolidated trace that lives in the time-structure of the exposure, the schedule. The handle is asserted as a sign (spaced trace exceeds massed at equal exposure), holding across the η×epochs sweep; the magnitudes are [O]. It is named as a structural handle on a structural trace, never as a clinical instruction.

The two halves meet — and the firewall

With the five results in hand the convergence is closed, and it can be stated cleanly. §36 (the B‑i half) applied the threshold-lever frame and found it reaches addiction's instantaneous reward drive but names the consolidated sensitisation gain out of reach — doubly, as a gain and as a learned trace. This chapter (the B‑ii half) takes the plasticity-dynamics route to that same axis and exhibits the gain directly: it accumulates with exposure (A1), it makes the circuit over-respond to an unchanged cue (A2), it survives extinction of the drive (A3), it vanishes without plasticity (A4, which is exactly why the instant levers miss it), and it has a structural handle in the schedule of exposure (A5). The two routes through the framework — threshold-leverisation and plasticity-dynamics — meet in one disorder: the lever route names the trace unreachable for instant levers honestly, and the dynamics route supplies the variable that moves it. Neither half overclaims; together they describe addiction's defining axis from both sides. No new mechanism, no new tuned constant, the E0 layer imported rather than re-derived, the engine byte-unchanged.

The firewall is absolute and must be stated in full. The retained ‖W−W₀‖ is the integrated sensitisation gain as a structural quantity — a matrix distance in a connectome model — and it is never a claim about the felt quality of craving, reward, wanting, or relapse (Axis-A firewall — consciousness_claim = 0, the hard problem stays open). Real addiction plasticity is heterogeneous — ΔFosB and CREB transcriptional cascades, BDNF-driven structural remodelling, AMPA-receptor trafficking, dendritic-spine changes, glutamatergic homeostatic adaptation, epigenetic marks — and this module asserts only the sign of a single phase-correlation Hebbian trace, never that this rule is the biology; the identity of the real plasticity rule is owed and graded [O]. The reward sign is grounded in the engine's M5 reward-prediction error; every magnitude — the rate η, the increments, the schedule effect — is [O], and the signs survive an η sweep. And the human boundaries are non-negotiable: addiction is a chronic, relapsing medical condition — a disorder of a consolidated reward circuit — not a moral failing, not a failure of will, not a deficit of character; the persistence shown here (extinction does not erase the trace) is the structural correlate of why relapse is part of the illness, not evidence that anyone could simply stop. Nothing here is a cure, a treatment, a recommendation, a dose, or a licence to acquire or use any substance. medium_efficacy_tested = 0; signs only, never magnitudes fit to a target; this is not medical advice, not a diagnosis, not a treatment protocol, and not a cure.