Skip to content

ML Plan: Optimum Binder Content Prediction

Context

Problem: Finding the optimum asphalt binder content requires multiple trial batches at different AC percentages (typically 4-5 trials). Each trial requires lab time, materials, and technician hours.

Goal: Predict the optimum binder content from aggregate properties and gradation, reducing trials from 4-5 down to 1-2 verification tests.

Data Available: 100+ historical Superpave mix designs with known optimum AC%.


What It Predicts

Output Description
Predicted Optimum AC% Recommended binder content (e.g., 5.4%)
Confidence Range Likely range (e.g., 5.2% - 5.6%)
Suggested Trial Points Recommended AC% values to test (e.g., 5.3%, 5.5%)

Data Requirements

Inputs (Already Captured)

Category Fields
Gradation 14 sieve values (% passing)
Aggregate Properties Gsb (bulk SG), Gsa (apparent SG), absorption
Consensus Properties Coarse angularity, fine angularity, flat/elongated, sand equivalency
Design Parameters NMAS, target air voids, N-design
Bailey Analysis CA ratio, FAc, FAf, VMA estimate
RAP Content RAP %, AC in RAP

Target (Already Captured)

  • jmf_asphalt_content - The optimum AC% determined through testing

Implementation Approach

Feature Engineering

# Key derived features
surface_area_estimate = f(gradation)  # Hveem surface area factor
voids_estimate = f(bailey_ratios, gradation)
absorption_adjustment = aggregate_absorption * correction_factor
rap_contribution = rap_percent * rap_ac_content

Model Architecture

Option A: Direct Regression - Input: Aggregate + gradation features → Output: Optimum AC% - Simple, interpretable

Option B: Two-Stage Model 1. Predict VMA at target air voids 2. Calculate AC% needed to achieve that VMA

Recommended: Option A for initial implementation (simpler, easier to validate)

ML Service

/app/services/ml/
  binder_predictor.py     # Main prediction service
  binder_trainer.py       # Training pipeline

API Endpoints

  • POST /api/ml/predict-binder-content - Get AC% prediction
  • POST /api/ml/validate-binder-prediction - Record actual vs predicted

UI Integration

  • Add prediction panel to new mix design form
  • Show suggested trial points
  • "Smart Trial" button that pre-fills trial batch form

Training Data Structure

# For each historical mix design
features = {
    # Gradation (14 features)
    'grad_75mm': 100.0,
    'grad_50mm': 100.0,
    # ... through grad_0_075mm

    # Aggregate properties
    'gsb_coarse': 2.65,
    'gsb_fine': 2.58,
    'combined_gsb': 2.62,
    'absorption': 1.2,

    # Consensus properties
    'coarse_angularity': 95,
    'fine_angularity': 45,
    'flat_elongated': 8,
    'sand_equivalency': 78,

    # Design parameters
    'nmas': 12.5,
    'target_air_voids': 4.0,
    'n_design': 75,

    # Bailey ratios
    'ca_ratio': 0.72,
    'fa_c': 0.42,
    'fa_f': 0.45,

    # RAP
    'rap_percent': 20,
    'rap_ac_content': 4.8,
}

target = 5.4  # Optimum AC%

Model Choice: Gradient Boosting or Linear Regression

Initial approach: Start with Ridge Regression - Highly interpretable (coefficient = impact of each feature) - Works well with correlated features (gradation sieves) - Easy to understand why prediction was made

If accuracy insufficient: Upgrade to XGBoost - Better handles non-linear relationships - Still provides feature importance


Verification

Accuracy Targets

  • MAE < 0.3% AC (e.g., predict 5.4%, actual 5.5%)
  • 90% of predictions within ±0.4% of actual

Validation Workflow

  1. Leave-one-out cross-validation on historical data
  2. Shadow mode on new designs for 1 month
  3. Compare predicted vs. lab-determined optimum
  4. Track "trials saved" metric

Value Proposition

Metric Current With ML
Trial batches per design 4-5 1-2
Lab time per design ~8 hours ~3 hours
Materials cost ~$400 ~$150

ROI: If you do 50 mix designs/year, savings of ~$12,500/year in direct costs plus significant time savings.


Timeline

Week Focus
1 Extract training data, feature engineering
2 Train and validate model
3 API and basic UI
4 Shadow mode testing
5-6 Pilot with engineers

Risks and Mitigations

Risk Mitigation
Insufficient accuracy Keep as "suggestion" not "requirement"
Unusual aggregates Flag out-of-distribution inputs
RAP variability Add RAP source tracking