ML Plan: Field Compaction Performance Prediction¶
Context¶
Problem: Compaction issues discovered in the field are expensive to fix. Multiple variables (mix design, production, environment, equipment) interact in ways that are difficult to predict manually.
Goal: Predict field compaction success based on mix design parameters and production variables, providing risk flags and actionable recommendations before paving begins.
Data Available: 100+ historical Superpave mix designs with field density test results.
What It Predicts¶
| Output | Description |
|---|---|
| Predicted Compaction % | Expected field compaction (e.g., 93.5%) |
| Confidence Interval | Range of likely outcomes (e.g., 92.1% - 94.9%) |
| Pass Probability | Likelihood of meeting spec (e.g., 87%) |
| Risk Level | Low / Medium / High |
| Recommendations | Actionable suggestions based on risk factors |
Data Requirements¶
Already Captured (from existing models)¶
- Gradation (14 sieves)
- Volumetrics (Gmm, Gmb, VMA, VFA, air voids)
- Bailey ratios (CA, FAc, FAf)
- Binder content and grade
- NMAS, design level
- Mat temperature (from AsphaltDensityTest)
Needs to be Added¶
| Category | Fields to Add |
|---|---|
| Production | plant_discharge_temp, haul_distance, haul_time, time_to_compact |
| Environmental | ambient_temp, wind_speed, humidity |
| Equipment | roller_type, roller_passes, lift_thickness |
| Base | base_type, base_condition, base_temperature |
Implementation Approach¶
New Models¶
AsphaltProductionVariables- Captures production/environmental dataCompactionPrediction- Stores predictions and outcomesMLModelVersion- Tracks model versions
ML Service (/app/services/ml/)¶
feature_extractor.py- Extract features from modelscompaction_predictor.py- Generate predictionsmodel_trainer.py- Training pipelinerecommendations.py- Generate actionable advice
API Endpoints¶
POST /api/ml/predict-compaction- Get predictionPOST /api/ml/record-outcome- Link to actual resultGET /api/ml/model-stats- Model performance
UI Integration¶
- Prediction widget on density test form
- Scenario builder on mix design page
- Risk badges on field schedule
Model Choice: XGBoost¶
Why XGBoost: - Works well with ~100 samples - Handles missing values natively - Interpretable (feature importance) - Fast inference
Features (~40 total): - 14 gradation values - 6 volumetric properties - 3 Bailey ratios - 10+ production/environmental variables - Derived features (cooling rate, workability window)
Verification¶
- Training: 5-fold cross-validation, 80/20 train/test split
- Metrics: MAE < 2%, R² > 0.7, Classification accuracy > 85%
- Shadow Mode: 2 weeks of silent predictions to validate
- Pilot: Principal engineers for 4 weeks
- Feedback Loop: Track prediction vs. actual, retrain when MAE > 3%
Timeline¶
| Week | Focus |
|---|---|
| 1-2 | Data models, migrations, form updates |
| 2-3 | ML services |
| 3-4 | API endpoints |
| 4-5 | UI integration |
| 5-8 | Validation and rollout |
Value¶
- Reduce rework: Identify high-risk paving conditions before they cause failures
- Actionable insights: "Increase mat temperature" vs. generic warnings
- Continuous improvement: Model improves as more data collected