Pre-registered predictions P22 to P30
Pre-registered predictions P22 to P30 — P22 cross-tests whether petroleum (or bitumen/oil sands) is explainable by purely in situ generation, or whether glacial/large-scale fluid transport repositioned material into sinks. However, petroleum presence/absence is highly sensitive to (i) basin existence, (ii) source rock/maturity, (iii) seal/trap, and (iv) exploration maturity (H-OIL/H-DISC).
P22 cross-tests whether petroleum (or bitumen/oil sands) is explainable by purely in situ generation, or whether glacial/large-scale fluid transport repositioned material into sinks .
P22 cross-tests whether petroleum (or bitumen/oil sands) is explainable by purely in situ generation, or whether glacial/large-scale fluid transport repositioned material into sinks. However, petroleum presence/absence is highly sensitive to (i) basin existence, (ii) source rock/maturity, (iii) seal/trap, and (iv) exploration maturity (H-OIL/H-DISC). Therefore P22 must first fit a null model that includes basin mask + petroleum-system covariates, and only then test the additional explanatory power of transport proxies. (Do not use trivial comparisons like “no oil on an alluvial plain”; AR-40.)
Subtests (pre-registered; TEST-OIL/GLACIAL family).
- TEST-OIL1 (spatial): for a province/basin objective variable (
presence/size), build a baseline model including petroleum system (source/maturity/trap) + exploration indicators (wells/seismic/exploration years, etc.), then test whether transport-to-sink features (distance to ice margin/meltwater drainage path/mega-delta terminus) improve prediction power (prereg criteria). - TEST-OIL2 (composition): test whether terrestrial biomarker ratios (e.g., oleanane, lignin-derived indicators, pollen/spores; specific markers pre-registered) covary with transport-to-sink proxies (delta-terminus distance, glacial-path proxies). The standard alternative is “normal river/ocean transport,” so include a non-glacial control basin with comparable river input (or comparable input/area) (AR-41).
- TEST-GLACIAL1 (source–sink): among comparable sedimentary basins (restricted to those with petroleum-system viability), test whether (a) upstream sources/paths show depletion and (b) downstream terminal sinks show enrichment.
Inputs (stub). data/petroleum/oil_provinces.csv, data/glacial/ice_extent_proxies.csv. For discovery-bias covariates, use P35's data/petroleum/oil_discovery_bias_cases.csv. (Field definitions/units: Appendix F and docs/codebook.md.)
FALSIFIER.
- (no additional explanatory power) if transport proxies vanish or become sign-unstable after controlling petroleum-system + exploration covariates, P22 is FAIL/HOLD.
- (no composition signal) if terrestrial biomarker patterns are explained by “normal river input” or show no sink-amplification signal, P22 is FAIL/HOLD.
- (confounding) if results are explained by exploration maturity / shelf sediment cover (mega-delta vs Middle-East-type low-clastic shelves), keep P22 only as a candidate explanation; do not promote to evidence (P35 PASS is a prerequisite).
Linked AR/H. AR-28, AR-39, AR-40, AR-41; competing hypotheses H-OIL, H-DISC. Pre-registration. config/p22_oil_glacial_prereg.yml.
P23 (exploratory; V-HOLOX): Volcanic Refugia — correlation of ice-free refugia with heatflow/arc proximity
P23 explores whether “ice-free refugia” patterns are not mere climate artifacts, but coupled to heatflow/volcanism (geothermal flux).
Test (TEST-REF1). In data/glacial/refugia_catalog.csv, collect ice presence/absence, heatflow proxies, distance to volcanic arcs, and climate covariates (temperature/precipitation), and run a multivariate comparison.
FALSIFIER. If adding climate covariates removes the heatflow term or makes its direction unstable, P23 is FAIL/HOLD.
Linked AR/H. AR-29; competing hypothesis H-REF.
P24 (exploratory; V-HOLOX): Endorheic Mega-Lakes — “evaporation clock” clustering
Bundle verdict (2025-12-27): PASS. (results/p24_endorheic_lakes.json)
P24 tests whether, in endorheic lake basins, the onset of low-stand transitions (LOW-onset) synchronizes around the registered event window (median t=4.2 ka). Using Oxford LLDB 1-kyr time-slice classification, extract for each lake the first time it transitions HIGH/MID → LOW and examine the distribution.
Data. Oxford Lake-Level Data Bank (NCEI) data/hydro/endorheic_lakes.csv (original: data/external/ncei_lakelevel/oxford_lldb_levels-noaa.txt).
Test (TEST-LAKE1). Use a 1-kyr bin representation of the event window as 3–5 ka, and test whether LOW-onset is enriched in that bin under a permutation null. (Non-significance is treated as HOLD, not FAIL.)
Result summary. Out of N_lake=358 lakes, LOW-onset was detected for Nₒₙₛₑₜ=146; among those, 39 onsets (26.7%) fall in the 3–5 ka window. Relative to a uniform null (1–max age bin), enrichment ≈ 1.60, with permutation p_≥≈ 0.00185; clustering is confirmed and locked PASS.
FALSIFIER. If LOW-onset is not enriched in the event window (lack of information) or is significantly depleted, P24 is HOLD/FAIL.
Interpretation (caution). P24 is a downstream-pattern showing that synchronized low-stand onsets are frequent near the “4.2 ka window.” Mechanistic proof (ARE → melt) is restricted to core PASS modules such as P19/P20/P29; P24 is only auxiliary coherence material.
Linked AR/H. AR-30; competing hypothesis H-LAKE.
P25 (optional; V-HOLOX): Shelf Asymmetry — global asymmetry of shelf width/incision/high-energy deposits
Standard geology already expects a shelf-width difference between the Atlantic (passive margins) and the Pacific (active margins). However, the user model requires additional signatures of event-like rapid drainage/rapid deposition: even in regions that are very arid today (e.g., deserts), past large drainage traces (submarine canyons/high-energy deposits) should remain on the shelf–slope system.
Test (TEST-SHELF1). In data/geomorph/shelf_width_profiles.csv, collect shelf width, submarine canyon density/fan structures, basin area (and present discharge), and compare whether patterns are explainable by “present climate/discharge only” versus retaining “event residuals.”
FALSIFIER. If shelf width/canyon density are sufficiently explained by long-duration discharge/sedimentation models and event-like residuals are absent in arid/low-discharge regions, P25 is FAIL/HOLD.
Linked AR/H. AR-25 (geomorphic controls), AR-30 (hydrology); competing hypothesis H-SHELF.
P26 (exploratory; V-STRATA): Great Unconformity — synchronization/high-energy signatures of basement truncation surfaces
P26 explores whether broad truncation surfaces like the “Great Unconformity” can synchronize to a single event (or a narrow window). If global synchronization does not hold, isolate as immediate HOLD/FAIL.
Test (TEST-UNCON1). In data/strat/unconformity_sites.csv, collect standardized unconformity age ranges, weathering/soil-development indicators, and high-energy indicators of overlying deposits.
FALSIFIER. If unconformity ages/formations are globally dispersed, or long-duration weathering/soil development is common, P26 is FAIL.
Linked AR/H. AR-31; competing hypothesis H-UNCON.
P27 (exploratory; V-STRATA): Polystrate Fossils — prevalence/environment distribution of multi-strata-penetrating cases
P27 classifies whether polystrate fossils are common products of “local rapid burial” or have patterns coupled to a broader event.
Test (TEST-POLY1). In data/strat/polystrate_cases.csv, record environment (delta/floodplain/pyroclastic/coastal, etc.), stratigraphy/structure, and age per case to assess distributions.
FALSIFIER. If cases are almost entirely restricted to local environments (e.g., floodplain/delta) and global synchronization is not observed, P27 is difficult to use as global evidence (FAIL/HOLD).
Linked AR/H. AR-31; competing hypothesis H-POLY.
P28 (exploratory; V-STRATA): Coal with Marine Fossils — mixed origin vs repeated transgression/reworking
P28 explores whether co-occurrence of coal beds with marine fossils/sediments indicates “large-scale transport/mixing” versus “repeated transgression/reworking.”
Test (TEST-COAL1). In data/strat/coal_marine_cases.csv, standardize and record rooting/soil indicators (in situ) vs reworking indicators, fossil assemblages, and sedimentary structures.
FALSIFIER. If rooting/soil indicators are common and marine fossils are explained by thin transgressive surfaces, P28 is hard to use as mixed-event evidence (FAIL/HOLD).
Linked AR/H. AR-31; competing hypothesis H-COAL.
P29 (optional; V-EVID): Joint Event Window Coherence — cross-proxy “event window” coherence
Bundle verdict (r18): HOLD — method-validated and reproducible (audited check C18), but the scientific verdict awaits the pre-registered proxy table. The r16 “PASS” was asserted without a reproducible engine, without a shipped data table, and with a randomization null that — as shown below — tests the wrong hypothesis. r18 supplies a correct, reproducible engine (atl_bundle/engine/c18_p29_coherence.py) and, in the absence of the pre-registered data/meta/event_window_estimates.csv, runs it on a clearly-labeled illustrative dataset to validate the machinery only. Until the pre-registered table is supplied and run, P29 is graded HOLD, not PASS, and the directional-coherence narrative leans only on its other legs (P1/P4 on the cause axis; P16/P19/P20/P24 downstream).
P29 quantifies the principle: “even if you collect many records, if they point to different times, eventness claims collapse.” For evidence-grade integration (V-EVID), event-time estimates must concentrate into a narrow window across at least 3 independent proxy_class.
Inputs (prereg required). From each module (P12/P15/P16/P19/P20/P21/P18, etc.), estimate a center time tᵢ and uncertainty σᵢ, and record them in data/meta/event_window_estimates.csv (recommended unit: ka BP).
Schema (recommended; DataPack v0.8). event_window_estimates.csv has at least: module, proxy_class, t_center_ka, sigma_ka, sign, weight, method, ref, include. Here sign∈-1,0,+1 denotes the directionality predicted by the event (0 = unspecified/unused), and only include=1 rows enter the coherence calculation.
Coherence metric. Let the weighted mean be t = Σ wᵢ tᵢ/Σ wᵢ with wᵢ=1/σᵢ², and define
Sign-coherence metric (optional). If sign is provided, compute the agreement rate with the modal nonzero sign:
Decision rule. P29 addresses timing coherence only; controls/confounders are separately gated in P30. Prereg thresholds: UNLOCK requires all of (i) K_joint≤ K_unlock, (ii) look-elsewhere-corrected p_LEE<0.05, (iii) ≥ 3 distinct proxy_class, (iv) jackknife-worst p_LEE<0.05, and (v) sign-coherence S_joint≥ S_; if K_joint≥ K_fail then FAIL; otherwise HOLD.
Randomization — the correct null (r18 correction). The metric K_joint depends only on the spread of the tᵢ. Permuting the tᵢ values among proxies preserves that spread, so a permutation-of-values null cannot test whether the cluster is tighter than chance — it tests the wrong hypothesis. The correct null draws each tᵢ independently from the pre-registered admissible range (tᵢ[RANGE_LO,RANGE_HI], e.g. the Holocene 0–11.7 ka) and computes the fraction of draws with Kₛᵢₘ≤ K_obs (=p_raw). This answers “could random times in the allowed range cluster this tightly?”
Look-elsewhere, independence, and fragility (r18 hard gates). Three corrections are mandatory and pre-registered:
- Look-elsewhere: if the window is scanned rather than fixed, the per-window p_raw is multiplied by the number of independent windows N_LEE=(RANGE_HI-RANGE_LO)/WINDOW_W: p_LEE=1-(1-p_raw)^()N_LEE. Only p_LEE is decision-eligible.
- Independence (AR-32): proxies sharing a
proxy_class/age-model are correlated, so the count of distinctproxy_class(must be ≥ 3) and an effective N_eff(same-class members down-weighted) are reported; “evidence redundancy” cannot be cashed as independent support. - Jackknife: leave-one-proxy-out must not flip the verdict — the worst-case p_LEE over all single-proxy deletions must still satisfy the gate, directly enforcing the FALSIFIER “coherence holds only by including a particular module set.”
On the illustrative set the corrections bite as intended (p_raw≈3.5×10⁻⁵→ p_LEE≈8×10⁻⁴ after a 23× look-elsewhere penalty; jackknife-worst p_LEE≈7×10⁻³); a genuine cluster survives, an over-fit one would not.
FALSIFIER. If coherence holds only by including a particular module set, or only by ignoring chronology uncertainty, FAIL/HOLD. (Especially, changing selection criteria post hoc is treated as strong HOLD close to STOP.)
Code. atl_bundle/engine/c18_p29_coherence.py computes K_joint, S_joint, the range-null p_raw, the look-elsewhere p_LEE, N_eff, and the jackknife, saving results/c18_p29_coherence_results.npz (figure figures/fig_c18_p29_coherence.png); audited checks C18a–c verify the statistical machinery (not a science verdict).
Linked AR/H. AR-32, AR-33; competing hypothesis H-SYNC.
P30 (optional; V-EVID): Negative Controls & Confounder Isolation — hard gate for controls/confounders
P30 structurally blocks the critique: “with enough cases you can build any story.” Therefore for V-EVID, not only (i) timing coherence (P29), but also (ii) each module's controls/confounder isolation must be pre-registered and fixed.
Inputs (prereg required).
data/meta/controls_registry.csv: control definitions per module (regions/datasets/randomization rules, etc.).- per-module result summaries: e.g.,
results/p19_sea_level_budget.json,results/p20_misfit_rivers.json, etc.
Decision rule (example). Suppose each module j outputs (a) a target-vs-control effect size E_j and (b) a null-hypothesis test p_j. In V-EVID, at least N_pass modules must satisfy
FALSIFIER.
- (missing controls) if
controls_registry.csvlacks a definition, or rules are changed post hoc, P30 is HOLD/FAIL (effectively STOP). - (non-specificity) if target/control differences are unclear (non-significant p_j), downgrade the evidence grade (ERL downgrade).
Implementation stub. code/p30_negative_controls.py (v1.23). Linked AR/H. AR-34; competing hypothesis H-CONF.