Lab Testing — Machine Learning Opportunities Catalog¶
Status: Research / brainstorming
Last updated: 2026-06-12
Scope: Every material test currently scaffolded in be-platform — soil, asphalt, concrete, aggregate. Identifies where ML can speed up testing, replace tests, or run as always-on QC.
Related docs¶
Topic-specific ML deep-dives already in this repo:
- asphalt-design-ml-binder-content.md — predict optimum AC% from gradation + aggregate properties
- asphalt-design-ml-aggregate-blend.md — aggregate blending optimization
- asphalt-design-ml-compaction-performance.md — gyratory compaction performance
- asphalt-design-ml-vma-prediction.md — VMA prediction for mix design
This catalog is the umbrella that situates those plans alongside the rest of the lab's test surface.
TL;DR — what to build first¶
- CONC-COMP 28-day strength predictor (from 7-day break + mix design + cure tank temperature). Concrete is the biggest test volume and the biggest cost; every cylinder break in the DB is already a labeled training row, and the cure-tank IoT sensors give you the temperature history the model needs. Highest immediate ROI.
- AGG-SIEVE camera-based gradation — tray photo → ML segments + sizes particles → full PSD in seconds. Replaces a 75-minute lab procedure. Mature ML space.
- AGG-FE / AGG-FRAC camera-based particle classification — eliminates tedious manual particle-by-particle counting using off-the-shelf vision models.
- CONC-SLUMP batch-ticket consistency monitor — always-on QC layer that predicts slump from the batch ticket and flags deviations >1.5". Cheap to build, high catch rate.
Everything else is in a tier-ordered backlog below.
Tier matrix¶
| Tier | Definition | Build now? |
|---|---|---|
| 1 — Quick wins | Mature ML, data already in DB, high $/test impact | Yes |
| 2 — High-value mid-term | Strong correlations exist, needs modest data collection | Plan for Q1-Q2 next year |
| 3 — Always-on QC | Anomaly detection over the result stream; low cost, high vigilance value | Build alongside Tier 1 |
| 4 — Image-classifier one-shots | Single-photo classification; off-the-shelf vision models | Opportunistic — bundle with kiosk camera work |
| 5 — Skip-the-test via history lookup | Predict result before testing for repeat projects on known materials | Long-term; needs project-history fingerprinting first |
| 6 — Not ML candidates | Pure physical procedures with no useful prediction target | Skip |
Tier 1 — Quick wins, build first¶
1.1 — Concrete 28-day compressive strength from 7-day break + mix + cure history¶
Test: CONC-COMP
Inputs we already collect: - 7-day cylinder break strength - Mix design: design strength, w/c ratio, cement type, slump spec - Field measurements: slump, air content, fresh concrete temperature - IoT cure-tank temperature history during cure window
Predicts: 28-day strength + confidence interval; downstream "skip the 28-day break" recommendation for routine pours
Why it pays off: - Avoid the 28-day break entirely for projects where the 7-day → 28-day ratio is stable - Alarm early on weak pours instead of waiting 21 more days to know they failed - Surface mix-design drift before it becomes a non-conformance
Data status: Every existing cylinder break is a labeled training row. The bottleneck is joining cure-tank IoT data to the cylinder rows by tank + time-of-cure.
1.2 — Camera-based aggregate gradation (AGG-SIEVE)¶
Test: AGG-SIEVE (and ASPH-related sieve work via reduce → sieve)
Inputs: single high-resolution tray photo of a representative sample
Predicts: full % passing curve for all standard sieves
Why it pays off: - Replaces a 75-minute wet-sieve procedure with a 5-second photo + 10-second inference - Mature ML space; commercial systems (e.g. Camsizer, Retsch IPro analogs) prove the approach - Off-the-shelf segmentation models (Mask R-CNN, SAM) handle the heavy lift
Data status: Need to start photographing sieved samples alongside the canonical sieve result for ~3-6 months to build the ground-truth set. Worth seeding NOW even if model build is later.
1.3 — Particle shape classifiers (AGG-FE, AGG-FRAC)¶
Tests: AGG-FE (flat & elongated), AGG-FRAC (fractured face count), AGG-FAA (angularity)
Inputs: photo of a representative particle sample
Predicts: % flat, % elongated, % fractured, angularity
Why it pays off: - These tests currently require tedious particle-by-particle manual counting (slow + error-prone) - Image classification is a solved problem for particle shape - Same data collection workflow as 1.2 — kill two birds
1.4 — Slump prediction from batch ticket (CONC-SLUMP)¶
Test: CONC-SLUMP
Inputs: batch ticket (cement content, water-cement ratio, admixture type + dose, cubic yards), ambient temperature, truck rotation history (if available), time-since-batch
Predicts: expected slump at point-of-pour + confidence interval
Why it pays off: - Quality gate at truck arrival: flag tickets whose predicted slump deviates from measured by >1.5" - Catches bad batches AND bad samples (both directions of error) - Builds on data you already capture in the batch ticket form
Tier 2 — High-value mid-term¶
2.1 — Proctor MDD/OMC from soil index properties¶
Test: SOIL-PROC
Inputs: sieve gradation (% passing #200), Atterberg limits (LL/PL), soil description, project geographic location
Predicts: maximum dry density + optimum moisture content + curve shape
Why it pays off: - Strong academic correlations published (Wang & Huang 2017; Sivrikaya 2008; many others) - For routine soils on known projects, run 2 confirmation points instead of 5 — cuts Proctor time in half - Could replace the test entirely for repeat projects
Data status: Need to back-fill grain size + LL/PL on existing Proctor records that don't have it. Going forward, capture it as part of test setup.
2.2 — Atterberg limits (SOIL-LL / SOIL-PL) from sieve + visual classification¶
Tests: SOIL-LL, SOIL-PL
Inputs: sieve gradation, soil description (USCS classification), project location
Predicts: LL, PL, plasticity index, classification confidence
Why it pays off: - Many soils have LL/PL highly predictable from grain size + clay fraction - Avoid the test entirely for routine projects on previously-characterized soils - Frees the tech for higher-value lab work
2.3 — Flexural strength from compressive strength (CONC-FLEX from CONC-COMP)¶
Test: CONC-FLEX
Predicts: modulus of rupture from compressive strength
Why it pays off:
- ACI relationship MOR ≈ k·√f'c is well-established; ML refines k per mix family
- Flex beams take ~2× the prep effort of cylinders + need bigger molds
- Could replace flex breaks on most projects once correlation is calibrated
2.4 — Lottman TSR prediction (ASPH-LOTT)¶
Test: ASPH-LOTT
Inputs: aggregate source, binder type/grade, anti-strip type + dose, mix gradation
Predicts: tensile strength ratio (passes / fails moisture-induced damage spec)
Why it pays off: - Test takes 7+ days (saturation + freeze-thaw + break) - Mix design optimization is currently a sequential dance because each variant takes a week - ML predictor lets the PE narrow candidate anti-strip doses before running the long test
2.5 — Hveem stability optimum binder (ASPH-HVEEM)¶
Test: ASPH-HVEEM
Inputs: gradation, aggregate properties, binder grade
Predicts: narrowed optimum binder content range
Why it pays off:
- Cuts trial briquettes from 5 → 2-3
- Parallels the existing Superpave AC% predictor in asphalt-design-ml-binder-content.md
2.6 — ASR reactivity from petrography + chemistry (AGG-ASR)¶
Tests: AGG-ASR, AGG-ASR-PREP, AGG-ASR-MIX
Inputs: petrographic analysis, chemistry (silica content, mineral assemblage), source quarry
Predicts: AMBT / CPT expansion class (innocuous / slowly reactive / reactive)
Why it pays off: - AMBT runs 16 days; CPT runs 12+ months - Screen aggregate sources in days instead of months - High strategic value — being able to qualify or disqualify a new quarry source quickly is a competitive moat
2.7 — LA Abrasion / Micro-Deval from petrography (AGG-LA, AGG-MDEVAL)¶
Tests: AGG-LA, AGG-MDEVAL
Inputs: petrographic analysis, source quarry, rock type, mineral composition
Predicts: abrasion loss % vs spec threshold
Why it pays off: - Wear resistance is intrinsic to mineral composition — highly predictable from rock type - Eliminates a 24-hour test for routine source verification - Source-quarry database becomes a strategic asset
2.8 — Specific gravity & absorption from source (AGG-SG-COARSE/FINE)¶
Tests: AGG-SG-COARSE, AGG-SG-FINE
Inputs: source quarry + petrography
Predicts: Gsb, Gsa, absorption %
Why it pays off: - These are intrinsic to the source aggregate — measurement is verification, not discovery - One-time characterization per source replaces repeated testing
2.9 — Rice gravity from gradation + AC + Gsb (ASPH-RICE)¶
Test: ASPH-RICE
Predicts: maximum theoretical specific gravity (Gmm)
Why it pays off: - Pure mass-balance physics; ML mostly for QC anomaly detection - Lets you replace a 45-min test with calculation for QA-tier samples
2.10 — Gyratory compaction prediction (ASPH-GYRA)¶
Test: ASPH-GYRA
Predicts: gyrations to target density / compaction curve from mix design
Why it pays off:
- Parallels the work already scoped in asphalt-design-ml-compaction-performance.md
- Cuts trial gyrations significantly
2.11 — Shrinkage prediction (CONC-SHRINK)¶
Test: CONC-SHRINK
Inputs: w/c ratio, cement type, paste content, aggregate type, ambient humidity history
Predicts: drying shrinkage at standard age
Why it pays off: - Test runs 28+ days minimum, sometimes 90+ - High ML payoff per test
Tier 3 — Always-on QC / anomaly detection¶
These run continuously over your result stream, no engineer effort per check, surface anomalies to PE review.
| ML monitor | Catches |
|---|---|
| SOIL-NUC anomaly detector | Gauge-out-of-contact errors, gauge calibration drift, gaming by operators (readings inconsistent with neighbors / pass-history / depth) |
| CONC-SLUMP vs batch ticket | Bad batches (ticket says high cement but slump is low) AND bad samples (ticket consistent, sample isn't) |
| CONC-AIR-P/V vs AEA dose response | AEA degradation, batch plant scale errors |
| Cure tank temperature deviation | Pours where temperature went out of band; predict the strength penalty |
| CONC-TEMP plant predictor | Real-time plant QC on ambient + truck + ice |
| AGG-MC stockpile drift | Stockpile moisture drifting from historical average — adjust batch water before batching |
| ASPH-COMP-CORR auto-derivation | Predict the gauge correction factor from mix + temp + roller pattern — eliminates the formal correlation drilling on routine jobs |
| Project-level result-vs-spec trend monitor | Predicts probability the next test fails spec; surfaces for PE review before failure |
| Test-result drift across days | Same project, same mix, drifting result — surfaces equipment or technique drift |
Tier 4 — Image-classifier one-shots¶
Quick wins with off-the-shelf or lightly-fine-tuned vision models. Bundle these with any camera/kiosk work that puts a phone in tech hands.
| Test | What the camera classifies |
|---|---|
| AGG-OI | Gardner color plate match → pass/fail |
| AGG-FAA | Fine aggregate angularity from particle shape |
| AGG-CLAY | Clay lumps & friable particles count |
| AGG-LWP | Lightweight particle count on floating fraction |
| Concrete cylinder break failure mode | Auto-classify cone / columnar / shear / splitting per ASTM C39 from break photo. Useful for QC trend analysis — failure mode correlates with mix quality issues |
| Concrete slump visual | Predict slump from cone photo (rough, but useful as a sanity check on the measured slump) |
Tier 5 — "Skip the test entirely" via project-history lookup¶
The highest-ROI move, but the hardest to build because it needs cross-project fingerprinting.
Concept: for repeat customers with established mix designs, known source aggregates, and known soil profiles, predict the test result before running it, surface a confidence band, and let the engineer confirm with a single sanity-check sample instead of a full battery.
Requires: - Cross-project soil/aggregate/mix fingerprinting — geographic location + source quarry + mix design as the keys - A "predicted vs. measured" feedback loop that refines accuracy over time - Engineer override + audit trail (a PE can always demand the full test)
Candidate tests for this treatment (intrinsic-property tests that don't change much per sample): - AGG-SG-COARSE / AGG-SG-FINE — once a source is characterized - AGG-LA / AGG-MDEVAL — same - AGG-ASR class — by source-quarry petrography - SOIL-LL / SOIL-PL — by soil-type + region - SOIL-SULFATES — by geographic location + groundwater chemistry
Tier 6 — Not ML candidates¶
Pure physical sample-prep procedures with no useful prediction target:
- AGG-BATCH (batching procedure)
- AGG-SAMP (sampling)
- AGG-SPLIT (sample splitting)
- ASPH-SAMP (sampling)
- ASPH-REDUCE (sample reduction)
- ASPH-CORING (coring procedure)
- CONC-MAKE (specimen molding)
- CONC-SAMP (sampling)
- CONC-CAP (specimen capping)
- CONC-CORE (core preparation)
- CONC-CORING (coring procedure)
- CONC-BEAM (beam preparation)
The outputs of some of these may feed Tier 1-5 predictors (e.g. coring → CONC-COMP which is Tier 1), but the procedure itself isn't ML-amenable.
Per-test full table¶
Quick scan-reference. See tiers above for detail.
| Test | ML opportunity | Tier | Data status |
|---|---|---|---|
| SOIL | |||
| SOIL-PROC | MDD/OMC from grain size + Atterberg | 2 | Need backfill of sieve + LL/PL on existing rows |
| SOIL-NUC | Field reading anomaly detection | 3 | Have it |
| SOIL-LL / SOIL-PL | Predict from gradation + visual class | 2 | Have it |
| SOIL-HYDRO | Image-based PSD from #200 fraction | 2 | Need particle imagery |
| SOIL-SULFATES-A/B | Predict from groundwater + geography | 5 | Needs GIS integration |
| ASPHALT | |||
| ASPH-IGN | AC% prediction within first 10 min of burn | 1 | Have burn data + final AC% |
| ASPH-EXTRACT | AC% from gradation + design | 2 | Have it |
| ASPH-AC-NUC | Anomaly + cross-cal with ignition | 3 | Have it |
| ASPH-COMP-CORR | Predict gauge correction from mix + temp | 3 | Have historical correlations |
| ASPH-COMP (field density) | Predict mat density from roller pattern + mat temp | 3 | Needs roller telemetry |
| ASPH-MOIST | Predict from weather + stockpile age | 3 | Need weather integration |
| ASPH-CORE-SG / BSG | Predict from gradation + AC + design | 2 | Have historical cores |
| ASPH-HVEEM | Narrow optimum binder search | 2 | Have it |
| ASPH-LOTT | TSR from aggregate/binder/anti-strip | 2 | Have it |
| ASPH-RICE | Predict Gmm from gradation + AC + Gsb | 2 | Have it |
| ASPH-GYRA | Gyrations to target from mix | 2 | Have it; see asphalt-design-ml-compaction-performance.md |
| CONCRETE | |||
| CONC-COMP | 28-day from 7-day + cure temp | 1 | Have it — top priority |
| CONC-SLUMP | Predict from batch ticket + temp + admixture | 1 | Have it |
| CONC-AIR-P / V | Anomaly on AEA dose response | 3 | Have it |
| CONC-UW | Yield consistency check | 3 | Have it |
| CONC-TEMP | Predict from ambient + load + ice | 3 | Need plant data integration |
| CONC-FLEX | Predict from CONC-COMP — skip flex breaks | 2 | Have it |
| CONC-SHRINK | Predict from w/c + paste + aggregate | 2 | Long-term test history needed |
| AGGREGATE | |||
| AGG-SIEVE | Camera-based PSD | 1 | Start collecting imagery |
| AGG-FE | Image classifier for flat/elongated | 1 | Start collecting imagery |
| AGG-FRAC | Image classifier for fractured faces | 1 | Start collecting imagery |
| AGG-FAA | Shape analysis from imagery | 4 | Imagery |
| AGG-CLAY | Image classifier on washed sample | 4 | Imagery |
| AGG-LWP | Image classifier on floating fraction | 4 | Imagery |
| AGG-OI | Gardner color image classifier | 4 | Imagery |
| AGG-MC | Stockpile drift from weather + age | 3 | Need weather integration |
| AGG-LA | Predict from petrography + source | 2 | Source database |
| AGG-MDEVAL | Same as LA | 2 | Source database |
| AGG-SE | Predict from FM + clay | 2 | Have it |
| AGG-SG-COARSE / FINE | Predict from source + petrography | 2 / 5 | Source database |
| AGG-ASR family | Predict reactivity from petrography | 2 | Source + petrography database |
| AGG-ZNBR2 / ZNCL2 | Predict from source | 5 | Source database |
| AGG-UW | Predict from PSD + Gsb | 2 | Have it |
| AGG-BLEND | Blending optimization | 2 | See asphalt-design-ml-aggregate-blend.md |
| AGG-BATCH / SAMP / SPLIT | Procedures — not ML | 6 | — |
Data infrastructure needed¶
Things to build alongside (or before) the ML work:
- Source-quarry database — every aggregate has a source. Tag every AGG-* test with its source quarry so we can train per-source predictors. Currently the data exists in mix designs but isn't first-class.
- Particle imagery pipeline — phones + tablets can take photos; we need a way to associate images with test result rows. Build the upload/storage path before the ML — start collecting imagery in parallel with normal testing.
- Cure tank → cylinder join — match cylinder break records to the tank IDs and cure-window temperature history. Critical for CONC-COMP predictor (#1.1).
- Mix design fingerprinting — a deterministic ID per (cement + aggregate sources + admixtures + ratios) so repeat mixes can pool training data.
- Project-history fingerprinting (Tier 5 prerequisite) — geographic + soil-type + customer + age keys so a "have we tested this material before?" lookup is possible.
- Predicted-vs-measured feedback loop — every prediction surfaced to an engineer should record their override + the actual test result, so models improve from real-world use.
Open questions¶
- Regulatory acceptance — for which tests will the state DOT / private clients accept an ML prediction in lieu of the canonical test? Likely answer: never for project-of-record results, but YES for screening / mix-development / decision support. Pursue ML as "narrows the test list" not "replaces the test."
- Liability — engineer-of-record stamps don't go away. ML output supports the engineer's judgment, never substitutes for it.
- Vendor partnerships vs. build — for camera-based gradation, commercial systems exist (Camsizer, etc.) and may be cheaper than internal ML build. Worth a make-vs-buy evaluation per opportunity.
- Data quality bar — historical test results have rounding inconsistencies and human-entry errors; some cleanup before training is required.
Suggested sequencing¶
| Sprint | Build |
|---|---|
| 0 — Prep (data infra) | Source-quarry database; particle imagery upload path; cure-tank → cylinder join |
| 1 — CONC-COMP 28-day predictor | Tier 1 top priority |
| 2 — AGG-SIEVE camera-based PSD pilot | Tier 1, start with image collection during sprint 0 |
| 3 — CONC-SLUMP batch-ticket QC monitor | Tier 1 |
| 4 — AGG-FE / AGG-FRAC particle classifier | Tier 1 |
| 5 — ASR reactivity screener | Tier 2, but high strategic value |
| 6 — Tier 3 anomaly detection suite | All in parallel; each is cheap to add |
| 7+ | Remaining Tier 2 work in priority order |
Filing tickets¶
Once a Tier 1 item is ready to commit, file as type:tracker + component:be-platform (or component:ml if we add it) with sub-tickets for the data infrastructure prerequisites. Reference this catalog from the tracker so future Claude / meridian-worker runs have the full context.