Lab Testing — Machine Learning Opportunities Catalog¶

Status: Research / brainstorming Last updated: 2026-06-12 Scope: Every material test currently scaffolded in be-platform — soil, asphalt, concrete, aggregate. Identifies where ML can speed up testing, replace tests, or run as always-on QC.

Topic-specific ML deep-dives already in this repo: - asphalt-design-ml-binder-content.md — predict optimum AC% from gradation + aggregate properties - asphalt-design-ml-aggregate-blend.md — aggregate blending optimization - asphalt-design-ml-compaction-performance.md — gyratory compaction performance - asphalt-design-ml-vma-prediction.md — VMA prediction for mix design

This catalog is the umbrella that situates those plans alongside the rest of the lab's test surface.

TL;DR — what to build first¶

CONC-COMP 28-day strength predictor (from 7-day break + mix design + cure tank temperature). Concrete is the biggest test volume and the biggest cost; every cylinder break in the DB is already a labeled training row, and the cure-tank IoT sensors give you the temperature history the model needs. Highest immediate ROI.
AGG-SIEVE camera-based gradation — tray photo → ML segments + sizes particles → full PSD in seconds. Replaces a 75-minute lab procedure. Mature ML space.
AGG-FE / AGG-FRAC camera-based particle classification — eliminates tedious manual particle-by-particle counting using off-the-shelf vision models.
CONC-SLUMP batch-ticket consistency monitor — always-on QC layer that predicts slump from the batch ticket and flags deviations >1.5". Cheap to build, high catch rate.

Everything else is in a tier-ordered backlog below.

Tier matrix¶

Tier	Definition	Build now?
1 — Quick wins	Mature ML, data already in DB, high $/test impact	Yes
2 — High-value mid-term	Strong correlations exist, needs modest data collection	Plan for Q1-Q2 next year
3 — Always-on QC	Anomaly detection over the result stream; low cost, high vigilance value	Build alongside Tier 1
4 — Image-classifier one-shots	Single-photo classification; off-the-shelf vision models	Opportunistic — bundle with kiosk camera work
5 — Skip-the-test via history lookup	Predict result before testing for repeat projects on known materials	Long-term; needs project-history fingerprinting first
6 — Not ML candidates	Pure physical procedures with no useful prediction target	Skip

Tier 1 — Quick wins, build first¶

1.1 — Concrete 28-day compressive strength from 7-day break + mix + cure history¶

Test: CONC-COMP

Inputs we already collect: - 7-day cylinder break strength - Mix design: design strength, w/c ratio, cement type, slump spec - Field measurements: slump, air content, fresh concrete temperature - IoT cure-tank temperature history during cure window

Predicts: 28-day strength + confidence interval; downstream "skip the 28-day break" recommendation for routine pours

Why it pays off: - Avoid the 28-day break entirely for projects where the 7-day → 28-day ratio is stable - Alarm early on weak pours instead of waiting 21 more days to know they failed - Surface mix-design drift before it becomes a non-conformance

Data status: Every existing cylinder break is a labeled training row. The bottleneck is joining cure-tank IoT data to the cylinder rows by tank + time-of-cure.

1.2 — Camera-based aggregate gradation (AGG-SIEVE)¶

Test: AGG-SIEVE (and ASPH-related sieve work via reduce → sieve)

Inputs: single high-resolution tray photo of a representative sample

Predicts: full % passing curve for all standard sieves

Why it pays off: - Replaces a 75-minute wet-sieve procedure with a 5-second photo + 10-second inference - Mature ML space; commercial systems (e.g. Camsizer, Retsch IPro analogs) prove the approach - Off-the-shelf segmentation models (Mask R-CNN, SAM) handle the heavy lift

Data status: Need to start photographing sieved samples alongside the canonical sieve result for ~3-6 months to build the ground-truth set. Worth seeding NOW even if model build is later.

1.3 — Particle shape classifiers (AGG-FE, AGG-FRAC)¶

Tests: AGG-FE (flat & elongated), AGG-FRAC (fractured face count), AGG-FAA (angularity)

Inputs: photo of a representative particle sample

Predicts: % flat, % elongated, % fractured, angularity

Why it pays off: - These tests currently require tedious particle-by-particle manual counting (slow + error-prone) - Image classification is a solved problem for particle shape - Same data collection workflow as 1.2 — kill two birds

1.4 — Slump prediction from batch ticket (CONC-SLUMP)¶

Test: CONC-SLUMP

Inputs: batch ticket (cement content, water-cement ratio, admixture type + dose, cubic yards), ambient temperature, truck rotation history (if available), time-since-batch

Predicts: expected slump at point-of-pour + confidence interval

Why it pays off: - Quality gate at truck arrival: flag tickets whose predicted slump deviates from measured by >1.5" - Catches bad batches AND bad samples (both directions of error) - Builds on data you already capture in the batch ticket form

Tier 2 — High-value mid-term¶

2.1 — Proctor MDD/OMC from soil index properties¶

Test: SOIL-PROC

Inputs: sieve gradation (% passing #200), Atterberg limits (LL/PL), soil description, project geographic location

Predicts: maximum dry density + optimum moisture content + curve shape

Why it pays off: - Strong academic correlations published (Wang & Huang 2017; Sivrikaya 2008; many others) - For routine soils on known projects, run 2 confirmation points instead of 5 — cuts Proctor time in half - Could replace the test entirely for repeat projects

Data status: Need to back-fill grain size + LL/PL on existing Proctor records that don't have it. Going forward, capture it as part of test setup.

2.2 — Atterberg limits (SOIL-LL / SOIL-PL) from sieve + visual classification¶

Tests: SOIL-LL, SOIL-PL

Inputs: sieve gradation, soil description (USCS classification), project location

Predicts: LL, PL, plasticity index, classification confidence

Why it pays off: - Many soils have LL/PL highly predictable from grain size + clay fraction - Avoid the test entirely for routine projects on previously-characterized soils - Frees the tech for higher-value lab work

2.3 — Flexural strength from compressive strength (CONC-FLEX from CONC-COMP)¶

Test: CONC-FLEX

Predicts: modulus of rupture from compressive strength

Why it pays off: - ACI relationship MOR ≈ k·√f'c is well-established; ML refines k per mix family - Flex beams take ~2× the prep effort of cylinders + need bigger molds - Could replace flex breaks on most projects once correlation is calibrated

2.4 — Lottman TSR prediction (ASPH-LOTT)¶

Test: ASPH-LOTT

Inputs: aggregate source, binder type/grade, anti-strip type + dose, mix gradation

Predicts: tensile strength ratio (passes / fails moisture-induced damage spec)

Why it pays off: - Test takes 7+ days (saturation + freeze-thaw + break) - Mix design optimization is currently a sequential dance because each variant takes a week - ML predictor lets the PE narrow candidate anti-strip doses before running the long test

2.5 — Hveem stability optimum binder (ASPH-HVEEM)¶

Test: ASPH-HVEEM

Inputs: gradation, aggregate properties, binder grade

Predicts: narrowed optimum binder content range

Why it pays off: - Cuts trial briquettes from 5 → 2-3 - Parallels the existing Superpave AC% predictor in asphalt-design-ml-binder-content.md

2.6 — ASR reactivity from petrography + chemistry (AGG-ASR)¶

Tests: AGG-ASR, AGG-ASR-PREP, AGG-ASR-MIX

Inputs: petrographic analysis, chemistry (silica content, mineral assemblage), source quarry

Predicts: AMBT / CPT expansion class (innocuous / slowly reactive / reactive)

Why it pays off: - AMBT runs 16 days; CPT runs 12+ months - Screen aggregate sources in days instead of months - High strategic value — being able to qualify or disqualify a new quarry source quickly is a competitive moat

2.7 — LA Abrasion / Micro-Deval from petrography (AGG-LA, AGG-MDEVAL)¶

Tests: AGG-LA, AGG-MDEVAL

Inputs: petrographic analysis, source quarry, rock type, mineral composition

Predicts: abrasion loss % vs spec threshold

Why it pays off: - Wear resistance is intrinsic to mineral composition — highly predictable from rock type - Eliminates a 24-hour test for routine source verification - Source-quarry database becomes a strategic asset

2.8 — Specific gravity & absorption from source (AGG-SG-COARSE/FINE)¶

Tests: AGG-SG-COARSE, AGG-SG-FINE

Inputs: source quarry + petrography

Predicts: Gsb, Gsa, absorption %

Why it pays off: - These are intrinsic to the source aggregate — measurement is verification, not discovery - One-time characterization per source replaces repeated testing

2.9 — Rice gravity from gradation + AC + Gsb (ASPH-RICE)¶

Test: ASPH-RICE

Predicts: maximum theoretical specific gravity (Gmm)

Why it pays off: - Pure mass-balance physics; ML mostly for QC anomaly detection - Lets you replace a 45-min test with calculation for QA-tier samples

2.10 — Gyratory compaction prediction (ASPH-GYRA)¶

Test: ASPH-GYRA

Predicts: gyrations to target density / compaction curve from mix design

Why it pays off: - Parallels the work already scoped in asphalt-design-ml-compaction-performance.md - Cuts trial gyrations significantly

2.11 — Shrinkage prediction (CONC-SHRINK)¶

Test: CONC-SHRINK

Inputs: w/c ratio, cement type, paste content, aggregate type, ambient humidity history

Predicts: drying shrinkage at standard age

Why it pays off: - Test runs 28+ days minimum, sometimes 90+ - High ML payoff per test

Tier 3 — Always-on QC / anomaly detection¶

These run continuously over your result stream, no engineer effort per check, surface anomalies to PE review.

ML monitor	Catches
SOIL-NUC anomaly detector	Gauge-out-of-contact errors, gauge calibration drift, gaming by operators (readings inconsistent with neighbors / pass-history / depth)
CONC-SLUMP vs batch ticket	Bad batches (ticket says high cement but slump is low) AND bad samples (ticket consistent, sample isn't)
CONC-AIR-P/V vs AEA dose response	AEA degradation, batch plant scale errors
Cure tank temperature deviation	Pours where temperature went out of band; predict the strength penalty
CONC-TEMP plant predictor	Real-time plant QC on ambient + truck + ice
AGG-MC stockpile drift	Stockpile moisture drifting from historical average — adjust batch water before batching
ASPH-COMP-CORR auto-derivation	Predict the gauge correction factor from mix + temp + roller pattern — eliminates the formal correlation drilling on routine jobs
Project-level result-vs-spec trend monitor	Predicts probability the next test fails spec; surfaces for PE review before failure
Test-result drift across days	Same project, same mix, drifting result — surfaces equipment or technique drift

Tier 4 — Image-classifier one-shots¶

Quick wins with off-the-shelf or lightly-fine-tuned vision models. Bundle these with any camera/kiosk work that puts a phone in tech hands.

Test	What the camera classifies
AGG-OI	Gardner color plate match → pass/fail
AGG-FAA	Fine aggregate angularity from particle shape
AGG-CLAY	Clay lumps & friable particles count
AGG-LWP	Lightweight particle count on floating fraction
Concrete cylinder break failure mode	Auto-classify cone / columnar / shear / splitting per ASTM C39 from break photo. Useful for QC trend analysis — failure mode correlates with mix quality issues
Concrete slump visual	Predict slump from cone photo (rough, but useful as a sanity check on the measured slump)

Tier 5 — "Skip the test entirely" via project-history lookup¶

The highest-ROI move, but the hardest to build because it needs cross-project fingerprinting.

Concept: for repeat customers with established mix designs, known source aggregates, and known soil profiles, predict the test result before running it, surface a confidence band, and let the engineer confirm with a single sanity-check sample instead of a full battery.

Requires: - Cross-project soil/aggregate/mix fingerprinting — geographic location + source quarry + mix design as the keys - A "predicted vs. measured" feedback loop that refines accuracy over time - Engineer override + audit trail (a PE can always demand the full test)

Candidate tests for this treatment (intrinsic-property tests that don't change much per sample): - AGG-SG-COARSE / AGG-SG-FINE — once a source is characterized - AGG-LA / AGG-MDEVAL — same - AGG-ASR class — by source-quarry petrography - SOIL-LL / SOIL-PL — by soil-type + region - SOIL-SULFATES — by geographic location + groundwater chemistry

Tier 6 — Not ML candidates¶

Pure physical sample-prep procedures with no useful prediction target:

AGG-BATCH (batching procedure)
AGG-SAMP (sampling)
AGG-SPLIT (sample splitting)
ASPH-SAMP (sampling)
ASPH-REDUCE (sample reduction)
ASPH-CORING (coring procedure)
CONC-MAKE (specimen molding)
CONC-SAMP (sampling)
CONC-CAP (specimen capping)
CONC-CORE (core preparation)
CONC-CORING (coring procedure)
CONC-BEAM (beam preparation)

The outputs of some of these may feed Tier 1-5 predictors (e.g. coring → CONC-COMP which is Tier 1), but the procedure itself isn't ML-amenable.

Per-test full table¶

Quick scan-reference. See tiers above for detail.

Test	ML opportunity	Tier	Data status
SOIL
SOIL-PROC	MDD/OMC from grain size + Atterberg	2	Need backfill of sieve + LL/PL on existing rows
SOIL-NUC	Field reading anomaly detection	3	Have it
SOIL-LL / SOIL-PL	Predict from gradation + visual class	2	Have it
SOIL-HYDRO	Image-based PSD from #200 fraction	2	Need particle imagery
SOIL-SULFATES-A/B	Predict from groundwater + geography	5	Needs GIS integration
ASPHALT
ASPH-IGN	AC% prediction within first 10 min of burn	1	Have burn data + final AC%
ASPH-EXTRACT	AC% from gradation + design	2	Have it
ASPH-AC-NUC	Anomaly + cross-cal with ignition	3	Have it
ASPH-COMP-CORR	Predict gauge correction from mix + temp	3	Have historical correlations
ASPH-COMP (field density)	Predict mat density from roller pattern + mat temp	3	Needs roller telemetry
ASPH-MOIST	Predict from weather + stockpile age	3	Need weather integration
ASPH-CORE-SG / BSG	Predict from gradation + AC + design	2	Have historical cores
ASPH-HVEEM	Narrow optimum binder search	2	Have it
ASPH-LOTT	TSR from aggregate/binder/anti-strip	2	Have it
ASPH-RICE	Predict Gmm from gradation + AC + Gsb	2	Have it
ASPH-GYRA	Gyrations to target from mix	2	Have it; see asphalt-design-ml-compaction-performance.md
CONCRETE
CONC-COMP	28-day from 7-day + cure temp	1	Have it — top priority
CONC-SLUMP	Predict from batch ticket + temp + admixture	1	Have it
CONC-AIR-P / V	Anomaly on AEA dose response	3	Have it
CONC-UW	Yield consistency check	3	Have it
CONC-TEMP	Predict from ambient + load + ice	3	Need plant data integration
CONC-FLEX	Predict from CONC-COMP — skip flex breaks	2	Have it
CONC-SHRINK	Predict from w/c + paste + aggregate	2	Long-term test history needed
AGGREGATE
AGG-SIEVE	Camera-based PSD	1	Start collecting imagery
AGG-FE	Image classifier for flat/elongated	1	Start collecting imagery
AGG-FRAC	Image classifier for fractured faces	1	Start collecting imagery
AGG-FAA	Shape analysis from imagery	4	Imagery
AGG-CLAY	Image classifier on washed sample	4	Imagery
AGG-LWP	Image classifier on floating fraction	4	Imagery
AGG-OI	Gardner color image classifier	4	Imagery
AGG-MC	Stockpile drift from weather + age	3	Need weather integration
AGG-LA	Predict from petrography + source	2	Source database
AGG-MDEVAL	Same as LA	2	Source database
AGG-SE	Predict from FM + clay	2	Have it
AGG-SG-COARSE / FINE	Predict from source + petrography	2 / 5	Source database
AGG-ASR family	Predict reactivity from petrography	2	Source + petrography database
AGG-ZNBR2 / ZNCL2	Predict from source	5	Source database
AGG-UW	Predict from PSD + Gsb	2	Have it
AGG-BLEND	Blending optimization	2	See asphalt-design-ml-aggregate-blend.md
AGG-BATCH / SAMP / SPLIT	Procedures — not ML	6	—

Data infrastructure needed¶

Things to build alongside (or before) the ML work:

Source-quarry database — every aggregate has a source. Tag every AGG-* test with its source quarry so we can train per-source predictors. Currently the data exists in mix designs but isn't first-class.
Particle imagery pipeline — phones + tablets can take photos; we need a way to associate images with test result rows. Build the upload/storage path before the ML — start collecting imagery in parallel with normal testing.
Cure tank → cylinder join — match cylinder break records to the tank IDs and cure-window temperature history. Critical for CONC-COMP predictor (#1.1).
Mix design fingerprinting — a deterministic ID per (cement + aggregate sources + admixtures + ratios) so repeat mixes can pool training data.
Project-history fingerprinting (Tier 5 prerequisite) — geographic + soil-type + customer + age keys so a "have we tested this material before?" lookup is possible.
Predicted-vs-measured feedback loop — every prediction surfaced to an engineer should record their override + the actual test result, so models improve from real-world use.

Open questions¶

Regulatory acceptance — for which tests will the state DOT / private clients accept an ML prediction in lieu of the canonical test? Likely answer: never for project-of-record results, but YES for screening / mix-development / decision support. Pursue ML as "narrows the test list" not "replaces the test."
Liability — engineer-of-record stamps don't go away. ML output supports the engineer's judgment, never substitutes for it.
Vendor partnerships vs. build — for camera-based gradation, commercial systems exist (Camsizer, etc.) and may be cheaper than internal ML build. Worth a make-vs-buy evaluation per opportunity.
Data quality bar — historical test results have rounding inconsistencies and human-entry errors; some cleanup before training is required.

Suggested sequencing¶

Sprint	Build
0 — Prep (data infra)	Source-quarry database; particle imagery upload path; cure-tank → cylinder join
1 — CONC-COMP 28-day predictor	Tier 1 top priority
2 — AGG-SIEVE camera-based PSD pilot	Tier 1, start with image collection during sprint 0
3 — CONC-SLUMP batch-ticket QC monitor	Tier 1
4 — AGG-FE / AGG-FRAC particle classifier	Tier 1
5 — ASR reactivity screener	Tier 2, but high strategic value
6 — Tier 3 anomaly detection suite	All in parallel; each is cheap to add
7+	Remaining Tier 2 work in priority order

Filing tickets¶

Once a Tier 1 item is ready to commit, file as type:tracker + component:be-platform (or component:ml if we add it) with sub-tickets for the data infrastructure prerequisites. Reference this catalog from the tracker so future Claude / meridian-worker runs have the full context.