Skip to content

Lab Testing — Machine Learning Opportunities Catalog

Status: Research / brainstorming Last updated: 2026-06-12 Scope: Every material test currently scaffolded in be-platform — soil, asphalt, concrete, aggregate. Identifies where ML can speed up testing, replace tests, or run as always-on QC.

Topic-specific ML deep-dives already in this repo: - asphalt-design-ml-binder-content.md — predict optimum AC% from gradation + aggregate properties - asphalt-design-ml-aggregate-blend.md — aggregate blending optimization - asphalt-design-ml-compaction-performance.md — gyratory compaction performance - asphalt-design-ml-vma-prediction.md — VMA prediction for mix design

This catalog is the umbrella that situates those plans alongside the rest of the lab's test surface.


TL;DR — what to build first

  1. CONC-COMP 28-day strength predictor (from 7-day break + mix design + cure tank temperature). Concrete is the biggest test volume and the biggest cost; every cylinder break in the DB is already a labeled training row, and the cure-tank IoT sensors give you the temperature history the model needs. Highest immediate ROI.
  2. AGG-SIEVE camera-based gradation — tray photo → ML segments + sizes particles → full PSD in seconds. Replaces a 75-minute lab procedure. Mature ML space.
  3. AGG-FE / AGG-FRAC camera-based particle classification — eliminates tedious manual particle-by-particle counting using off-the-shelf vision models.
  4. CONC-SLUMP batch-ticket consistency monitor — always-on QC layer that predicts slump from the batch ticket and flags deviations >1.5". Cheap to build, high catch rate.

Everything else is in a tier-ordered backlog below.


Tier matrix

Tier Definition Build now?
1 — Quick wins Mature ML, data already in DB, high $/test impact Yes
2 — High-value mid-term Strong correlations exist, needs modest data collection Plan for Q1-Q2 next year
3 — Always-on QC Anomaly detection over the result stream; low cost, high vigilance value Build alongside Tier 1
4 — Image-classifier one-shots Single-photo classification; off-the-shelf vision models Opportunistic — bundle with kiosk camera work
5 — Skip-the-test via history lookup Predict result before testing for repeat projects on known materials Long-term; needs project-history fingerprinting first
6 — Not ML candidates Pure physical procedures with no useful prediction target Skip

Tier 1 — Quick wins, build first

1.1 — Concrete 28-day compressive strength from 7-day break + mix + cure history

Test: CONC-COMP

Inputs we already collect: - 7-day cylinder break strength - Mix design: design strength, w/c ratio, cement type, slump spec - Field measurements: slump, air content, fresh concrete temperature - IoT cure-tank temperature history during cure window

Predicts: 28-day strength + confidence interval; downstream "skip the 28-day break" recommendation for routine pours

Why it pays off: - Avoid the 28-day break entirely for projects where the 7-day → 28-day ratio is stable - Alarm early on weak pours instead of waiting 21 more days to know they failed - Surface mix-design drift before it becomes a non-conformance

Data status: Every existing cylinder break is a labeled training row. The bottleneck is joining cure-tank IoT data to the cylinder rows by tank + time-of-cure.

1.2 — Camera-based aggregate gradation (AGG-SIEVE)

Test: AGG-SIEVE (and ASPH-related sieve work via reduce → sieve)

Inputs: single high-resolution tray photo of a representative sample

Predicts: full % passing curve for all standard sieves

Why it pays off: - Replaces a 75-minute wet-sieve procedure with a 5-second photo + 10-second inference - Mature ML space; commercial systems (e.g. Camsizer, Retsch IPro analogs) prove the approach - Off-the-shelf segmentation models (Mask R-CNN, SAM) handle the heavy lift

Data status: Need to start photographing sieved samples alongside the canonical sieve result for ~3-6 months to build the ground-truth set. Worth seeding NOW even if model build is later.

1.3 — Particle shape classifiers (AGG-FE, AGG-FRAC)

Tests: AGG-FE (flat & elongated), AGG-FRAC (fractured face count), AGG-FAA (angularity)

Inputs: photo of a representative particle sample

Predicts: % flat, % elongated, % fractured, angularity

Why it pays off: - These tests currently require tedious particle-by-particle manual counting (slow + error-prone) - Image classification is a solved problem for particle shape - Same data collection workflow as 1.2 — kill two birds

1.4 — Slump prediction from batch ticket (CONC-SLUMP)

Test: CONC-SLUMP

Inputs: batch ticket (cement content, water-cement ratio, admixture type + dose, cubic yards), ambient temperature, truck rotation history (if available), time-since-batch

Predicts: expected slump at point-of-pour + confidence interval

Why it pays off: - Quality gate at truck arrival: flag tickets whose predicted slump deviates from measured by >1.5" - Catches bad batches AND bad samples (both directions of error) - Builds on data you already capture in the batch ticket form


Tier 2 — High-value mid-term

2.1 — Proctor MDD/OMC from soil index properties

Test: SOIL-PROC

Inputs: sieve gradation (% passing #200), Atterberg limits (LL/PL), soil description, project geographic location

Predicts: maximum dry density + optimum moisture content + curve shape

Why it pays off: - Strong academic correlations published (Wang & Huang 2017; Sivrikaya 2008; many others) - For routine soils on known projects, run 2 confirmation points instead of 5 — cuts Proctor time in half - Could replace the test entirely for repeat projects

Data status: Need to back-fill grain size + LL/PL on existing Proctor records that don't have it. Going forward, capture it as part of test setup.

2.2 — Atterberg limits (SOIL-LL / SOIL-PL) from sieve + visual classification

Tests: SOIL-LL, SOIL-PL

Inputs: sieve gradation, soil description (USCS classification), project location

Predicts: LL, PL, plasticity index, classification confidence

Why it pays off: - Many soils have LL/PL highly predictable from grain size + clay fraction - Avoid the test entirely for routine projects on previously-characterized soils - Frees the tech for higher-value lab work

2.3 — Flexural strength from compressive strength (CONC-FLEX from CONC-COMP)

Test: CONC-FLEX

Predicts: modulus of rupture from compressive strength

Why it pays off: - ACI relationship MOR ≈ k·√f'c is well-established; ML refines k per mix family - Flex beams take ~2× the prep effort of cylinders + need bigger molds - Could replace flex breaks on most projects once correlation is calibrated

2.4 — Lottman TSR prediction (ASPH-LOTT)

Test: ASPH-LOTT

Inputs: aggregate source, binder type/grade, anti-strip type + dose, mix gradation

Predicts: tensile strength ratio (passes / fails moisture-induced damage spec)

Why it pays off: - Test takes 7+ days (saturation + freeze-thaw + break) - Mix design optimization is currently a sequential dance because each variant takes a week - ML predictor lets the PE narrow candidate anti-strip doses before running the long test

2.5 — Hveem stability optimum binder (ASPH-HVEEM)

Test: ASPH-HVEEM

Inputs: gradation, aggregate properties, binder grade

Predicts: narrowed optimum binder content range

Why it pays off: - Cuts trial briquettes from 5 → 2-3 - Parallels the existing Superpave AC% predictor in asphalt-design-ml-binder-content.md

2.6 — ASR reactivity from petrography + chemistry (AGG-ASR)

Tests: AGG-ASR, AGG-ASR-PREP, AGG-ASR-MIX

Inputs: petrographic analysis, chemistry (silica content, mineral assemblage), source quarry

Predicts: AMBT / CPT expansion class (innocuous / slowly reactive / reactive)

Why it pays off: - AMBT runs 16 days; CPT runs 12+ months - Screen aggregate sources in days instead of months - High strategic value — being able to qualify or disqualify a new quarry source quickly is a competitive moat

2.7 — LA Abrasion / Micro-Deval from petrography (AGG-LA, AGG-MDEVAL)

Tests: AGG-LA, AGG-MDEVAL

Inputs: petrographic analysis, source quarry, rock type, mineral composition

Predicts: abrasion loss % vs spec threshold

Why it pays off: - Wear resistance is intrinsic to mineral composition — highly predictable from rock type - Eliminates a 24-hour test for routine source verification - Source-quarry database becomes a strategic asset

2.8 — Specific gravity & absorption from source (AGG-SG-COARSE/FINE)

Tests: AGG-SG-COARSE, AGG-SG-FINE

Inputs: source quarry + petrography

Predicts: Gsb, Gsa, absorption %

Why it pays off: - These are intrinsic to the source aggregate — measurement is verification, not discovery - One-time characterization per source replaces repeated testing

2.9 — Rice gravity from gradation + AC + Gsb (ASPH-RICE)

Test: ASPH-RICE

Predicts: maximum theoretical specific gravity (Gmm)

Why it pays off: - Pure mass-balance physics; ML mostly for QC anomaly detection - Lets you replace a 45-min test with calculation for QA-tier samples

2.10 — Gyratory compaction prediction (ASPH-GYRA)

Test: ASPH-GYRA

Predicts: gyrations to target density / compaction curve from mix design

Why it pays off: - Parallels the work already scoped in asphalt-design-ml-compaction-performance.md - Cuts trial gyrations significantly

2.11 — Shrinkage prediction (CONC-SHRINK)

Test: CONC-SHRINK

Inputs: w/c ratio, cement type, paste content, aggregate type, ambient humidity history

Predicts: drying shrinkage at standard age

Why it pays off: - Test runs 28+ days minimum, sometimes 90+ - High ML payoff per test


Tier 3 — Always-on QC / anomaly detection

These run continuously over your result stream, no engineer effort per check, surface anomalies to PE review.

ML monitor Catches
SOIL-NUC anomaly detector Gauge-out-of-contact errors, gauge calibration drift, gaming by operators (readings inconsistent with neighbors / pass-history / depth)
CONC-SLUMP vs batch ticket Bad batches (ticket says high cement but slump is low) AND bad samples (ticket consistent, sample isn't)
CONC-AIR-P/V vs AEA dose response AEA degradation, batch plant scale errors
Cure tank temperature deviation Pours where temperature went out of band; predict the strength penalty
CONC-TEMP plant predictor Real-time plant QC on ambient + truck + ice
AGG-MC stockpile drift Stockpile moisture drifting from historical average — adjust batch water before batching
ASPH-COMP-CORR auto-derivation Predict the gauge correction factor from mix + temp + roller pattern — eliminates the formal correlation drilling on routine jobs
Project-level result-vs-spec trend monitor Predicts probability the next test fails spec; surfaces for PE review before failure
Test-result drift across days Same project, same mix, drifting result — surfaces equipment or technique drift

Tier 4 — Image-classifier one-shots

Quick wins with off-the-shelf or lightly-fine-tuned vision models. Bundle these with any camera/kiosk work that puts a phone in tech hands.

Test What the camera classifies
AGG-OI Gardner color plate match → pass/fail
AGG-FAA Fine aggregate angularity from particle shape
AGG-CLAY Clay lumps & friable particles count
AGG-LWP Lightweight particle count on floating fraction
Concrete cylinder break failure mode Auto-classify cone / columnar / shear / splitting per ASTM C39 from break photo. Useful for QC trend analysis — failure mode correlates with mix quality issues
Concrete slump visual Predict slump from cone photo (rough, but useful as a sanity check on the measured slump)

Tier 5 — "Skip the test entirely" via project-history lookup

The highest-ROI move, but the hardest to build because it needs cross-project fingerprinting.

Concept: for repeat customers with established mix designs, known source aggregates, and known soil profiles, predict the test result before running it, surface a confidence band, and let the engineer confirm with a single sanity-check sample instead of a full battery.

Requires: - Cross-project soil/aggregate/mix fingerprinting — geographic location + source quarry + mix design as the keys - A "predicted vs. measured" feedback loop that refines accuracy over time - Engineer override + audit trail (a PE can always demand the full test)

Candidate tests for this treatment (intrinsic-property tests that don't change much per sample): - AGG-SG-COARSE / AGG-SG-FINE — once a source is characterized - AGG-LA / AGG-MDEVAL — same - AGG-ASR class — by source-quarry petrography - SOIL-LL / SOIL-PL — by soil-type + region - SOIL-SULFATES — by geographic location + groundwater chemistry


Tier 6 — Not ML candidates

Pure physical sample-prep procedures with no useful prediction target:

  • AGG-BATCH (batching procedure)
  • AGG-SAMP (sampling)
  • AGG-SPLIT (sample splitting)
  • ASPH-SAMP (sampling)
  • ASPH-REDUCE (sample reduction)
  • ASPH-CORING (coring procedure)
  • CONC-MAKE (specimen molding)
  • CONC-SAMP (sampling)
  • CONC-CAP (specimen capping)
  • CONC-CORE (core preparation)
  • CONC-CORING (coring procedure)
  • CONC-BEAM (beam preparation)

The outputs of some of these may feed Tier 1-5 predictors (e.g. coring → CONC-COMP which is Tier 1), but the procedure itself isn't ML-amenable.


Per-test full table

Quick scan-reference. See tiers above for detail.

Test ML opportunity Tier Data status
SOIL
SOIL-PROC MDD/OMC from grain size + Atterberg 2 Need backfill of sieve + LL/PL on existing rows
SOIL-NUC Field reading anomaly detection 3 Have it
SOIL-LL / SOIL-PL Predict from gradation + visual class 2 Have it
SOIL-HYDRO Image-based PSD from #200 fraction 2 Need particle imagery
SOIL-SULFATES-A/B Predict from groundwater + geography 5 Needs GIS integration
ASPHALT
ASPH-IGN AC% prediction within first 10 min of burn 1 Have burn data + final AC%
ASPH-EXTRACT AC% from gradation + design 2 Have it
ASPH-AC-NUC Anomaly + cross-cal with ignition 3 Have it
ASPH-COMP-CORR Predict gauge correction from mix + temp 3 Have historical correlations
ASPH-COMP (field density) Predict mat density from roller pattern + mat temp 3 Needs roller telemetry
ASPH-MOIST Predict from weather + stockpile age 3 Need weather integration
ASPH-CORE-SG / BSG Predict from gradation + AC + design 2 Have historical cores
ASPH-HVEEM Narrow optimum binder search 2 Have it
ASPH-LOTT TSR from aggregate/binder/anti-strip 2 Have it
ASPH-RICE Predict Gmm from gradation + AC + Gsb 2 Have it
ASPH-GYRA Gyrations to target from mix 2 Have it; see asphalt-design-ml-compaction-performance.md
CONCRETE
CONC-COMP 28-day from 7-day + cure temp 1 Have it — top priority
CONC-SLUMP Predict from batch ticket + temp + admixture 1 Have it
CONC-AIR-P / V Anomaly on AEA dose response 3 Have it
CONC-UW Yield consistency check 3 Have it
CONC-TEMP Predict from ambient + load + ice 3 Need plant data integration
CONC-FLEX Predict from CONC-COMP — skip flex breaks 2 Have it
CONC-SHRINK Predict from w/c + paste + aggregate 2 Long-term test history needed
AGGREGATE
AGG-SIEVE Camera-based PSD 1 Start collecting imagery
AGG-FE Image classifier for flat/elongated 1 Start collecting imagery
AGG-FRAC Image classifier for fractured faces 1 Start collecting imagery
AGG-FAA Shape analysis from imagery 4 Imagery
AGG-CLAY Image classifier on washed sample 4 Imagery
AGG-LWP Image classifier on floating fraction 4 Imagery
AGG-OI Gardner color image classifier 4 Imagery
AGG-MC Stockpile drift from weather + age 3 Need weather integration
AGG-LA Predict from petrography + source 2 Source database
AGG-MDEVAL Same as LA 2 Source database
AGG-SE Predict from FM + clay 2 Have it
AGG-SG-COARSE / FINE Predict from source + petrography 2 / 5 Source database
AGG-ASR family Predict reactivity from petrography 2 Source + petrography database
AGG-ZNBR2 / ZNCL2 Predict from source 5 Source database
AGG-UW Predict from PSD + Gsb 2 Have it
AGG-BLEND Blending optimization 2 See asphalt-design-ml-aggregate-blend.md
AGG-BATCH / SAMP / SPLIT Procedures — not ML 6

Data infrastructure needed

Things to build alongside (or before) the ML work:

  1. Source-quarry database — every aggregate has a source. Tag every AGG-* test with its source quarry so we can train per-source predictors. Currently the data exists in mix designs but isn't first-class.
  2. Particle imagery pipeline — phones + tablets can take photos; we need a way to associate images with test result rows. Build the upload/storage path before the ML — start collecting imagery in parallel with normal testing.
  3. Cure tank → cylinder join — match cylinder break records to the tank IDs and cure-window temperature history. Critical for CONC-COMP predictor (#1.1).
  4. Mix design fingerprinting — a deterministic ID per (cement + aggregate sources + admixtures + ratios) so repeat mixes can pool training data.
  5. Project-history fingerprinting (Tier 5 prerequisite) — geographic + soil-type + customer + age keys so a "have we tested this material before?" lookup is possible.
  6. Predicted-vs-measured feedback loop — every prediction surfaced to an engineer should record their override + the actual test result, so models improve from real-world use.

Open questions

  • Regulatory acceptance — for which tests will the state DOT / private clients accept an ML prediction in lieu of the canonical test? Likely answer: never for project-of-record results, but YES for screening / mix-development / decision support. Pursue ML as "narrows the test list" not "replaces the test."
  • Liability — engineer-of-record stamps don't go away. ML output supports the engineer's judgment, never substitutes for it.
  • Vendor partnerships vs. build — for camera-based gradation, commercial systems exist (Camsizer, etc.) and may be cheaper than internal ML build. Worth a make-vs-buy evaluation per opportunity.
  • Data quality bar — historical test results have rounding inconsistencies and human-entry errors; some cleanup before training is required.

Suggested sequencing

Sprint Build
0 — Prep (data infra) Source-quarry database; particle imagery upload path; cure-tank → cylinder join
1 — CONC-COMP 28-day predictor Tier 1 top priority
2 — AGG-SIEVE camera-based PSD pilot Tier 1, start with image collection during sprint 0
3 — CONC-SLUMP batch-ticket QC monitor Tier 1
4 — AGG-FE / AGG-FRAC particle classifier Tier 1
5 — ASR reactivity screener Tier 2, but high strategic value
6 — Tier 3 anomaly detection suite All in parallel; each is cheap to add
7+ Remaining Tier 2 work in priority order

Filing tickets

Once a Tier 1 item is ready to commit, file as type:tracker + component:be-platform (or component:ml if we add it) with sub-tickets for the data infrastructure prerequisites. Reference this catalog from the tracker so future Claude / meridian-worker runs have the full context.