This project develops an end-to-end machine learning pipeline to predict and optimize the corrosion resistance of industrial surface coatings. Leveraging both historical experimental and production data, the pipeline covers data exploration, regression, classification, feature engineering, model selection, and interpretation.
Surface coatings are critical in manufacturing, protecting products from corrosion and extending their lifespans. Machine learning models can help optimize coatings by predicting performance and identifying key factors that minimize corrosion.
Part | Description |
---|---|
1. Data Exploration | Visualize and analyze variable distributions, relationships, and group differences. |
2. Regression | Predict logit-transformed corrosion fraction using linear, Bayesian, and advanced regression models. |
3. Classification | Classify whether corrosion is below 33% (EVENT) and identify key drivers using linear, Bayesian, and advanced classifiers. |
4. Feature Engineering & Selection | Engineer and evaluate derived features based on SME input. |
5. Model Selection & Interpretation | Compare, interpret, and visualize best models; recommend optimal input settings. |
- Inputs:
- Chemistry variables:
x1
,x2
,x3
,x4
- Manufacturing process variables:
v1
,v2
,v3
,v4
,v5
- Machine used:
m
(categorical)
- Chemistry variables:
- Output:
- Fraction of corroded surface after accelerated life testing (
output
)
- Fraction of corroded surface after accelerated life testing (
- Derived Features:
x5 = 1 - (x1 + x2 + x3 + x4)
(balance constituent)w = x2 / (x3 + x4)
z = (x1 + x2) / (x4 + x5)
t = v1 * v2
- Visualize distributions of all features and the response (including logit transformation).
- Explore group differences by machine type (
m
). - Analyze correlations and relationships among features and response.
- Base Features: Use original input variables.
- Expanded Features: Use derived features and select subsets as appropriate.
- Fit 9 linear models (additive, interaction, basis functions), select and visualize top 3.
- Fit 2 Bayesian linear models (using
rstanarm
or Laplace Approximation). - Compare models using appropriate metrics (e.g., RMSE, R²).
- Visualize predictive trends, confidence, and prediction intervals.
- Binary classification: EVENT if corrosion < 33%, NON-EVENT otherwise.
- Fit 9 GLMs (additive, interaction, basis functions), select and visualize top 3.
- Fit 2 Bayesian GLMs.
- Compare models using metrics (e.g., Accuracy, ROC AUC).
- Visualize predicted probabilities and feature importances.
- Train and tune complex models:
- Regularized regression (Elastic Net)
- Neural Networks
- Random Forest
- Gradient Boosted Trees
- 2 additional methods (e.g., SVM, GAM, MARS, PLS, KNN, Stacking)
- Evaluate both base and expanded feature sets.
- Rank features by importance.
- Identify best models and most important variables.
- Visualize predicted outcomes vs. key inputs.
- Recommend input settings to minimize corrosion.
- Explore differences across machine types.
- (Bonus) Perform input optimization for selected machine types.
- Python (or R, as appropriate)
- pandas, numpy, matplotlib, seaborn, scikit-learn, xgboost, lightgbm, tensorflow/keras, caret/tidymodels (R)
- Bayesian modeling:
rstanarm
, Laplace Approximation - Jupyter Notebook / RMarkdown
- Optimize surface coating recipes and process conditions to minimize corrosion.
- Identify which features (including derived SME features) are most predictive.
- Classify which conditions lead to high vs. low corrosion.
- Interpret the impact of chemistry, process, and machinery on corrosion outcomes.
- Clone this repository.
- Place your data CSV file in the
data/
directory. - Run the notebooks or scripts in order:
1_data_exploration.ipynb
2_regression.ipynb
3_classification.ipynb
4_feature_engineering.ipynb
5_model_selection_and_interpretation.ipynb
- Follow the instructions in each notebook/script for results and visualizations.
- PPG Industries
- Surface Coating Design and Testing
- scikit-learn documentation
- rstanarm documentation
- tidymodels documentation
Author: Sharika Loganathan Date: Dec 2022