Learning what to think — the field itself is shaped by reward

The laid-down field is learned: selection drives a dopamine reward-prediction error that updates which eddies appear next [model], with gamma read-only. The mediator is the dopamine value signal, cited from neuro; recruitment and reversal behave directionally as predicted. The absolute learning rate remains open.

Which eddies get laid down is itself learned. The outcome of a selection generates a dopamine reward-prediction error — the same signal the neuro chain uses for value — and that error updates which eddies are laid down next. The system therefore learns what to think, not only what to do: useful lines of thought become more available, dead ends less. γ is read-only here: it gates ignition but is not itself rewritten by the learning. Absolute learning rate and dopamine concentration are open; the direction of the update (toward rewarded eddies) is the load-bearing claim.

Learning what to think

The parallel field of §3 is not fixed wiring; it is shaped by experience. Over time, the eddies that tend to be laid down change — which is what it means to learn what to think, not merely what to do. A practised mind proposes different candidates than a novice one.

Selection → dopamine RPE

The teacher is the same one neuro uses. When a selected eddy leads to a better-or-worse-than-expected outcome, a dopamine reward-prediction error (RPE) is generated — the identical value signal from the neuro chain. This paper cites that mechanism; the novelty is only its target.

Updating the laid-down field

The RPE updates which eddies are laid down next: positively-surprising lines become more available, negatively-surprising ones less. The learning acts on the candidate set itself, biasing the parallel field toward what has paid off — recruitment and reversal follow the sign of the error.

γ is read-only

An important guard: γ is read-only in this loop. γ sets ignitability and gates communication (§2, §3), but the learning does not rewrite γ; it rewrites which assemblies are made available. The rhythm is the gate, not the ledger — this keeps the mediator (ionic phase-gating) distinct from the stored content.

Open

The absolute learning rate, the driving units, and the dopamine concentrations are open; the direction of the update — toward rewarded eddies — is the claim that carries (model, with the RPE component direction-supported from neuro) (§9).