ML Plan: Field Compaction Performance Prediction¶

Context¶

Problem: Compaction issues discovered in the field are expensive to fix. Multiple variables (mix design, production, environment, equipment) interact in ways that are difficult to predict manually.

Goal: Predict field compaction success based on mix design parameters and production variables, providing risk flags and actionable recommendations before paving begins.

Data Available: 100+ historical Superpave mix designs with field density test results.

What It Predicts¶

Output	Description
Predicted Compaction %	Expected field compaction (e.g., 93.5%)
Confidence Interval	Range of likely outcomes (e.g., 92.1% - 94.9%)
Pass Probability	Likelihood of meeting spec (e.g., 87%)
Risk Level	Low / Medium / High
Recommendations	Actionable suggestions based on risk factors

Data Requirements¶

Already Captured (from existing models)¶

Gradation (14 sieves)
Volumetrics (Gmm, Gmb, VMA, VFA, air voids)
Bailey ratios (CA, FAc, FAf)
Binder content and grade
NMAS, design level
Mat temperature (from AsphaltDensityTest)

Needs to be Added¶

Category	Fields to Add
Production	`plant_discharge_temp`, `haul_distance`, `haul_time`, `time_to_compact`
Environmental	`ambient_temp`, `wind_speed`, `humidity`
Equipment	`roller_type`, `roller_passes`, `lift_thickness`
Base	`base_type`, `base_condition`, `base_temperature`

Implementation Approach¶

New Models¶

AsphaltProductionVariables - Captures production/environmental data
CompactionPrediction - Stores predictions and outcomes
MLModelVersion - Tracks model versions

ML Service (`/app/services/ml/`)¶

feature_extractor.py - Extract features from models
compaction_predictor.py - Generate predictions
model_trainer.py - Training pipeline
recommendations.py - Generate actionable advice

API Endpoints¶

POST /api/ml/predict-compaction - Get prediction
POST /api/ml/record-outcome - Link to actual result
GET /api/ml/model-stats - Model performance

UI Integration¶

Prediction widget on density test form
Scenario builder on mix design page
Risk badges on field schedule

Model Choice: XGBoost¶

Why XGBoost: - Works well with ~100 samples - Handles missing values natively - Interpretable (feature importance) - Fast inference

Features (~40 total): - 14 gradation values - 6 volumetric properties - 3 Bailey ratios - 10+ production/environmental variables - Derived features (cooling rate, workability window)

Verification¶

Training: 5-fold cross-validation, 80/20 train/test split
Metrics: MAE < 2%, R² > 0.7, Classification accuracy > 85%
Shadow Mode: 2 weeks of silent predictions to validate
Pilot: Principal engineers for 4 weeks
Feedback Loop: Track prediction vs. actual, retrain when MAE > 3%

Timeline¶

Week	Focus
1-2	Data models, migrations, form updates
2-3	ML services
3-4	API endpoints
4-5	UI integration
5-8	Validation and rollout

Value¶

Reduce rework: Identify high-risk paving conditions before they cause failures
Actionable insights: "Increase mat temperature" vs. generic warnings
Continuous improvement: Model improves as more data collected