Pre-registered predictions P22 to P30

Pre-registered predictions P22 to P30 — P22 cross-tests whether petroleum (or bitumen/oil sands) is explainable by purely in situ generation, or whether glacial/large-scale fluid transport repositioned material into sinks. However, petroleum presence/absence is highly sensitive to (i) basin existence, (ii) source rock/maturity, (iii) seal/trap, and (iv) exploration maturity (H-OIL/H-DISC).

P22 cross-tests whether petroleum (or bitumen/oil sands) is explainable by purely in situ generation, or whether glacial/large-scale fluid transport repositioned material into sinks .

P22 cross-tests whether petroleum (or bitumen/oil sands) is explainable by purely in situ generation, or whether glacial/large-scale fluid transport repositioned material into sinks. However, petroleum presence/absence is highly sensitive to (i) basin existence, (ii) source rock/maturity, (iii) seal/trap, and (iv) exploration maturity (H-OIL/H-DISC). Therefore P22 must first fit a null model that includes basin mask + petroleum-system covariates, and only then test the additional explanatory power of transport proxies. (Do not use trivial comparisons like “no oil on an alluvial plain”; AR-40.)

Subtests (pre-registered; TEST-OIL/GLACIAL family).

TEST-OIL1 (spatial): for a province/basin objective variable (presence/size), build a baseline model including petroleum system (source/maturity/trap) + exploration indicators (wells/seismic/exploration years, etc.), then test whether transport-to-sink features (distance to ice margin/meltwater drainage path/mega-delta terminus) improve prediction power (prereg criteria).
TEST-OIL2 (composition): test whether terrestrial biomarker ratios (e.g., oleanane, lignin-derived indicators, pollen/spores; specific markers pre-registered) covary with transport-to-sink proxies (delta-terminus distance, glacial-path proxies). The standard alternative is “normal river/ocean transport,” so include a non-glacial control basin with comparable river input (or comparable input/area) (AR-41).
TEST-GLACIAL1 (source–sink): among comparable sedimentary basins (restricted to those with petroleum-system viability), test whether (a) upstream sources/paths show depletion and (b) downstream terminal sinks show enrichment.

Inputs (stub). data/petroleum/oil_provinces.csv, data/glacial/ice_extent_proxies.csv. For discovery-bias covariates, use P35's data/petroleum/oil_discovery_bias_cases.csv. (Field definitions/units: Appendix F and docs/codebook.md.)

FALSIFIER.

(no additional explanatory power) if transport proxies vanish or become sign-unstable after controlling petroleum-system + exploration covariates, P22 is FAIL/HOLD.
(no composition signal) if terrestrial biomarker patterns are explained by “normal river input” or show no sink-amplification signal, P22 is FAIL/HOLD.
(confounding) if results are explained by exploration maturity / shelf sediment cover (mega-delta vs Middle-East-type low-clastic shelves), keep P22 only as a candidate explanation; do not promote to evidence (P35 PASS is a prerequisite).

Linked AR/H. AR-28, AR-39, AR-40, AR-41; competing hypotheses H-OIL, H-DISC. Pre-registration. config/p22_oil_glacial_prereg.yml.

P23 (exploratory; V-HOLOX): Volcanic Refugia — correlation of ice-free refugia with heatflow/arc proximity

P23 explores whether “ice-free refugia” patterns are not mere climate artifacts, but coupled to heatflow/volcanism (geothermal flux).

Test (TEST-REF1). In data/glacial/refugia_catalog.csv, collect ice presence/absence, heatflow proxies, distance to volcanic arcs, and climate covariates (temperature/precipitation), and run a multivariate comparison.

FALSIFIER. If adding climate covariates removes the heatflow term or makes its direction unstable, P23 is FAIL/HOLD.

Linked AR/H. AR-29; competing hypothesis H-REF.

P24 (exploratory; V-HOLOX): Endorheic Mega-Lakes — “evaporation clock” clustering

Bundle verdict (2025-12-27): PASS. (results/p24_endorheic_lakes.json)

P24 tests whether, in endorheic lake basins, the onset of low-stand transitions (LOW-onset) synchronizes around the registered event window (median t=4.2 ka). Using Oxford LLDB 1-kyr time-slice classification, extract for each lake the first time it transitions HIGH/MID → LOW and examine the distribution.

Data. Oxford Lake-Level Data Bank (NCEI) data/hydro/endorheic_lakes.csv (original: data/external/ncei_lakelevel/oxford_lldb_levels-noaa.txt).

Test (TEST-LAKE1). Use a 1-kyr bin representation of the event window as 3–5 ka, and test whether LOW-onset is enriched in that bin under a permutation null. (Non-significance is treated as HOLD, not FAIL.)

Result summary. Out of N_lake=358 lakes, LOW-onset was detected for Nₒₙₛₑₜ=146; among those, 39 onsets (26.7%) fall in the 3–5 ka window. Relative to a uniform null (1–max age bin), enrichment ≈ 1.60, with permutation p_≥≈ 0.00185; clustering is confirmed and locked PASS.

FALSIFIER. If LOW-onset is not enriched in the event window (lack of information) or is significantly depleted, P24 is HOLD/FAIL.

Interpretation (caution). P24 is a downstream-pattern showing that synchronized low-stand onsets are frequent near the “4.2 ka window.” Mechanistic proof (ARE → melt) is restricted to core PASS modules such as P19/P20/P29; P24 is only auxiliary coherence material.

Linked AR/H. AR-30; competing hypothesis H-LAKE.

P25 (optional; V-HOLOX): Shelf Asymmetry — global asymmetry of shelf width/incision/high-energy deposits

Standard geology already expects a shelf-width difference between the Atlantic (passive margins) and the Pacific (active margins). However, the user model requires additional signatures of event-like rapid drainage/rapid deposition: even in regions that are very arid today (e.g., deserts), past large drainage traces (submarine canyons/high-energy deposits) should remain on the shelf–slope system.

Test (TEST-SHELF1). In data/geomorph/shelf_width_profiles.csv, collect shelf width, submarine canyon density/fan structures, basin area (and present discharge), and compare whether patterns are explainable by “present climate/discharge only” versus retaining “event residuals.”

FALSIFIER. If shelf width/canyon density are sufficiently explained by long-duration discharge/sedimentation models and event-like residuals are absent in arid/low-discharge regions, P25 is FAIL/HOLD.

Linked AR/H. AR-25 (geomorphic controls), AR-30 (hydrology); competing hypothesis H-SHELF.

P26 (exploratory; V-STRATA): Great Unconformity — synchronization/high-energy signatures of basement truncation surfaces

P26 explores whether broad truncation surfaces like the “Great Unconformity” can synchronize to a single event (or a narrow window). If global synchronization does not hold, isolate as immediate HOLD/FAIL.

Test (TEST-UNCON1). In data/strat/unconformity_sites.csv, collect standardized unconformity age ranges, weathering/soil-development indicators, and high-energy indicators of overlying deposits.

FALSIFIER. If unconformity ages/formations are globally dispersed, or long-duration weathering/soil development is common, P26 is FAIL.

Linked AR/H. AR-31; competing hypothesis H-UNCON.

P27 (exploratory; V-STRATA): Polystrate Fossils — prevalence/environment distribution of multi-strata-penetrating cases

P27 classifies whether polystrate fossils are common products of “local rapid burial” or have patterns coupled to a broader event.

Test (TEST-POLY1). In data/strat/polystrate_cases.csv, record environment (delta/floodplain/pyroclastic/coastal, etc.), stratigraphy/structure, and age per case to assess distributions.

FALSIFIER. If cases are almost entirely restricted to local environments (e.g., floodplain/delta) and global synchronization is not observed, P27 is difficult to use as global evidence (FAIL/HOLD).

Linked AR/H. AR-31; competing hypothesis H-POLY.

P28 (exploratory; V-STRATA): Coal with Marine Fossils — mixed origin vs repeated transgression/reworking

P28 explores whether co-occurrence of coal beds with marine fossils/sediments indicates “large-scale transport/mixing” versus “repeated transgression/reworking.”

Test (TEST-COAL1). In data/strat/coal_marine_cases.csv, standardize and record rooting/soil indicators (in situ) vs reworking indicators, fossil assemblages, and sedimentary structures.

FALSIFIER. If rooting/soil indicators are common and marine fossils are explained by thin transgressive surfaces, P28 is hard to use as mixed-event evidence (FAIL/HOLD).

Linked AR/H. AR-31; competing hypothesis H-COAL.

P29 (optional; V-EVID): Joint Event Window Coherence — cross-proxy “event window” coherence

Bundle verdict (r18): HOLD — method-validated and reproducible (audited check C18), but the scientific verdict awaits the pre-registered proxy table. The r16 “PASS” was asserted without a reproducible engine, without a shipped data table, and with a randomization null that — as shown below — tests the wrong hypothesis. r18 supplies a correct, reproducible engine (atl_bundle/engine/c18_p29_coherence.py) and, in the absence of the pre-registered data/meta/event_window_estimates.csv, runs it on a clearly-labeled illustrative dataset to validate the machinery only. Until the pre-registered table is supplied and run, P29 is graded HOLD, not PASS, and the directional-coherence narrative leans only on its other legs (P1/P4 on the cause axis; P16/P19/P20/P24 downstream).

P29 quantifies the principle: “even if you collect many records, if they point to different times, eventness claims collapse.” For evidence-grade integration (V-EVID), event-time estimates must concentrate into a narrow window across at least 3 independent proxy_class.

Inputs (prereg required). From each module (P12/P15/P16/P19/P20/P21/P18, etc.), estimate a center time tᵢ and uncertainty σᵢ, and record them in data/meta/event_window_estimates.csv (recommended unit: ka BP).

Schema (recommended; DataPack v0.8). event_window_estimates.csv has at least: module, proxy_class, t_center_ka, sigma_ka, sign, weight, method, ref, include. Here sign∈-1,0,+1 denotes the directionality predicted by the event (0 = unspecified/unused), and only include=1 rows enter the coherence calculation.

Coherence metric. Let the weighted mean be t = Σ wᵢ tᵢ/Σ wᵢ with wᵢ=1/σᵢ², and define

K_joint = √(Σ wᵢ (tᵢ- t)²/Σ wᵢ) / median(σᵢ).

Sign-coherence metric (optional). If sign is provided, compute the agreement rate with the modal nonzero sign:

If S_joint is low, even if timing matches, physical directionality may be contradictory.

Decision rule. P29 addresses timing coherence only; controls/confounders are separately gated in P30. Prereg thresholds: UNLOCK requires all of (i) K_joint≤ K_unlock, (ii) look-elsewhere-corrected p_LEE<0.05, (iii) ≥ 3 distinct proxy_class, (iv) jackknife-worst p_LEE<0.05, and (v) sign-coherence S_joint≥ S_; if K_joint≥ K_fail then FAIL; otherwise HOLD.

Randomization — the correct null (r18 correction). The metric K_joint depends only on the spread of the tᵢ. Permuting the tᵢ values among proxies preserves that spread, so a permutation-of-values null cannot test whether the cluster is tighter than chance — it tests the wrong hypothesis. The correct null draws each tᵢ independently from the pre-registered admissible range (tᵢ[RANGE_LO,RANGE_HI], e.g. the Holocene 0–11.7 ka) and computes the fraction of draws with Kₛᵢₘ≤ K_obs (=p_raw). This answers “could random times in the allowed range cluster this tightly?”

Look-elsewhere, independence, and fragility (r18 hard gates). Three corrections are mandatory and pre-registered:

Look-elsewhere: if the window is scanned rather than fixed, the per-window p_raw is multiplied by the number of independent windows N_LEE=(RANGE_HI-RANGE_LO)/WINDOW_W: p_LEE=1-(1-p_raw)^()N_LEE. Only p_LEE is decision-eligible.
Independence (AR-32): proxies sharing a proxy_class/age-model are correlated, so the count of distinct proxy_class (must be ≥ 3) and an effective N_eff (same-class members down-weighted) are reported; “evidence redundancy” cannot be cashed as independent support.
Jackknife: leave-one-proxy-out must not flip the verdict — the worst-case p_LEE over all single-proxy deletions must still satisfy the gate, directly enforcing the FALSIFIER “coherence holds only by including a particular module set.”

On the illustrative set the corrections bite as intended (p_raw≈3.5×10⁻⁵→ p_LEE≈8×10⁻⁴ after a 23× look-elsewhere penalty; jackknife-worst p_LEE≈7×10⁻³); a genuine cluster survives, an over-fit one would not.

FALSIFIER. If coherence holds only by including a particular module set, or only by ignoring chronology uncertainty, FAIL/HOLD. (Especially, changing selection criteria post hoc is treated as strong HOLD close to STOP.)

Code. atl_bundle/engine/c18_p29_coherence.py computes K_joint, S_joint, the range-null p_raw, the look-elsewhere p_LEE, N_eff, and the jackknife, saving results/c18_p29_coherence_results.npz (figure figures/fig_c18_p29_coherence.png); audited checks C18a–c verify the statistical machinery (not a science verdict).

Linked AR/H. AR-32, AR-33; competing hypothesis H-SYNC.

P30 (optional; V-EVID): Negative Controls & Confounder Isolation — hard gate for controls/confounders

P30 structurally blocks the critique: “with enough cases you can build any story.” Therefore for V-EVID, not only (i) timing coherence (P29), but also (ii) each module's controls/confounder isolation must be pre-registered and fixed.

Inputs (prereg required).

data/meta/controls_registry.csv: control definitions per module (regions/datasets/randomization rules, etc.).
per-module result summaries: e.g., results/p19_sea_level_budget.json, results/p20_misfit_rivers.json, etc.

Decision rule (example). Suppose each module j outputs (a) a target-vs-control effect size E_j and (b) a null-hypothesis test p_j. In V-EVID, at least N_pass modules must satisfy

with detailed thresholds fixed in a prereg YAML.

FALSIFIER.

(missing controls) if controls_registry.csv lacks a definition, or rules are changed post hoc, P30 is HOLD/FAIL (effectively STOP).
(non-specificity) if target/control differences are unclear (non-significant p_j), downgrade the evidence grade (ERL downgrade).

Implementation stub. code/p30_negative_controls.py (v1.23). Linked AR/H. AR-34; competing hypothesis H-CONF.