Skip to content

ML Plan: Field Compaction Performance Prediction

Context

Problem: Compaction issues discovered in the field are expensive to fix. Multiple variables (mix design, production, environment, equipment) interact in ways that are difficult to predict manually.

Goal: Predict field compaction success based on mix design parameters and production variables, providing risk flags and actionable recommendations before paving begins.

Data Available: 100+ historical Superpave mix designs with field density test results.


What It Predicts

Output Description
Predicted Compaction % Expected field compaction (e.g., 93.5%)
Confidence Interval Range of likely outcomes (e.g., 92.1% - 94.9%)
Pass Probability Likelihood of meeting spec (e.g., 87%)
Risk Level Low / Medium / High
Recommendations Actionable suggestions based on risk factors

Data Requirements

Already Captured (from existing models)

  • Gradation (14 sieves)
  • Volumetrics (Gmm, Gmb, VMA, VFA, air voids)
  • Bailey ratios (CA, FAc, FAf)
  • Binder content and grade
  • NMAS, design level
  • Mat temperature (from AsphaltDensityTest)

Needs to be Added

Category Fields to Add
Production plant_discharge_temp, haul_distance, haul_time, time_to_compact
Environmental ambient_temp, wind_speed, humidity
Equipment roller_type, roller_passes, lift_thickness
Base base_type, base_condition, base_temperature

Implementation Approach

New Models

  1. AsphaltProductionVariables - Captures production/environmental data
  2. CompactionPrediction - Stores predictions and outcomes
  3. MLModelVersion - Tracks model versions

ML Service (/app/services/ml/)

  • feature_extractor.py - Extract features from models
  • compaction_predictor.py - Generate predictions
  • model_trainer.py - Training pipeline
  • recommendations.py - Generate actionable advice

API Endpoints

  • POST /api/ml/predict-compaction - Get prediction
  • POST /api/ml/record-outcome - Link to actual result
  • GET /api/ml/model-stats - Model performance

UI Integration

  • Prediction widget on density test form
  • Scenario builder on mix design page
  • Risk badges on field schedule

Model Choice: XGBoost

Why XGBoost: - Works well with ~100 samples - Handles missing values natively - Interpretable (feature importance) - Fast inference

Features (~40 total): - 14 gradation values - 6 volumetric properties - 3 Bailey ratios - 10+ production/environmental variables - Derived features (cooling rate, workability window)


Verification

  1. Training: 5-fold cross-validation, 80/20 train/test split
  2. Metrics: MAE < 2%, R² > 0.7, Classification accuracy > 85%
  3. Shadow Mode: 2 weeks of silent predictions to validate
  4. Pilot: Principal engineers for 4 weeks
  5. Feedback Loop: Track prediction vs. actual, retrain when MAE > 3%

Timeline

Week Focus
1-2 Data models, migrations, form updates
2-3 ML services
3-4 API endpoints
4-5 UI integration
5-8 Validation and rollout

Value

  • Reduce rework: Identify high-risk paving conditions before they cause failures
  • Actionable insights: "Increase mat temperature" vs. generic warnings
  • Continuous improvement: Model improves as more data collected