ML Plan: Optimum Binder Content Prediction¶
Context¶
Problem: Finding the optimum asphalt binder content requires multiple trial batches at different AC percentages (typically 4-5 trials). Each trial requires lab time, materials, and technician hours.
Goal: Predict the optimum binder content from aggregate properties and gradation, reducing trials from 4-5 down to 1-2 verification tests.
Data Available: 100+ historical Superpave mix designs with known optimum AC%.
What It Predicts¶
| Output | Description |
|---|---|
| Predicted Optimum AC% | Recommended binder content (e.g., 5.4%) |
| Confidence Range | Likely range (e.g., 5.2% - 5.6%) |
| Suggested Trial Points | Recommended AC% values to test (e.g., 5.3%, 5.5%) |
Data Requirements¶
Inputs (Already Captured)¶
| Category | Fields |
|---|---|
| Gradation | 14 sieve values (% passing) |
| Aggregate Properties | Gsb (bulk SG), Gsa (apparent SG), absorption |
| Consensus Properties | Coarse angularity, fine angularity, flat/elongated, sand equivalency |
| Design Parameters | NMAS, target air voids, N-design |
| Bailey Analysis | CA ratio, FAc, FAf, VMA estimate |
| RAP Content | RAP %, AC in RAP |
Target (Already Captured)¶
jmf_asphalt_content- The optimum AC% determined through testing
Implementation Approach¶
Feature Engineering¶
# Key derived features
surface_area_estimate = f(gradation) # Hveem surface area factor
voids_estimate = f(bailey_ratios, gradation)
absorption_adjustment = aggregate_absorption * correction_factor
rap_contribution = rap_percent * rap_ac_content
Model Architecture¶
Option A: Direct Regression - Input: Aggregate + gradation features → Output: Optimum AC% - Simple, interpretable
Option B: Two-Stage Model 1. Predict VMA at target air voids 2. Calculate AC% needed to achieve that VMA
Recommended: Option A for initial implementation (simpler, easier to validate)
ML Service¶
/app/services/ml/
binder_predictor.py # Main prediction service
binder_trainer.py # Training pipeline
API Endpoints¶
POST /api/ml/predict-binder-content- Get AC% predictionPOST /api/ml/validate-binder-prediction- Record actual vs predicted
UI Integration¶
- Add prediction panel to new mix design form
- Show suggested trial points
- "Smart Trial" button that pre-fills trial batch form
Training Data Structure¶
# For each historical mix design
features = {
# Gradation (14 features)
'grad_75mm': 100.0,
'grad_50mm': 100.0,
# ... through grad_0_075mm
# Aggregate properties
'gsb_coarse': 2.65,
'gsb_fine': 2.58,
'combined_gsb': 2.62,
'absorption': 1.2,
# Consensus properties
'coarse_angularity': 95,
'fine_angularity': 45,
'flat_elongated': 8,
'sand_equivalency': 78,
# Design parameters
'nmas': 12.5,
'target_air_voids': 4.0,
'n_design': 75,
# Bailey ratios
'ca_ratio': 0.72,
'fa_c': 0.42,
'fa_f': 0.45,
# RAP
'rap_percent': 20,
'rap_ac_content': 4.8,
}
target = 5.4 # Optimum AC%
Model Choice: Gradient Boosting or Linear Regression¶
Initial approach: Start with Ridge Regression - Highly interpretable (coefficient = impact of each feature) - Works well with correlated features (gradation sieves) - Easy to understand why prediction was made
If accuracy insufficient: Upgrade to XGBoost - Better handles non-linear relationships - Still provides feature importance
Verification¶
Accuracy Targets¶
- MAE < 0.3% AC (e.g., predict 5.4%, actual 5.5%)
- 90% of predictions within ±0.4% of actual
Validation Workflow¶
- Leave-one-out cross-validation on historical data
- Shadow mode on new designs for 1 month
- Compare predicted vs. lab-determined optimum
- Track "trials saved" metric
Value Proposition¶
| Metric | Current | With ML |
|---|---|---|
| Trial batches per design | 4-5 | 1-2 |
| Lab time per design | ~8 hours | ~3 hours |
| Materials cost | ~$400 | ~$150 |
ROI: If you do 50 mix designs/year, savings of ~$12,500/year in direct costs plus significant time savings.
Timeline¶
| Week | Focus |
|---|---|
| 1 | Extract training data, feature engineering |
| 2 | Train and validate model |
| 3 | API and basic UI |
| 4 | Shadow mode testing |
| 5-6 | Pilot with engineers |
Risks and Mitigations¶
| Risk | Mitigation |
|---|---|
| Insufficient accuracy | Keep as "suggestion" not "requirement" |
| Unusual aggregates | Flag out-of-distribution inputs |
| RAP variability | Add RAP source tracking |