Cost/Defect Optimisation of a Manufacturing Process Using a Large Dataset

#Python, #MachineLearning, #Statistics 

This academic project required analysis of a large dataset containing 4 input variables and 2 output variables (a defect variable and a cost variable) for a new 4-step manufacturing process (each step had 4 input variables - Temperature, Vibration, Arc Gap, Pressure - as well as the aforementioned 2 output variables).

The objectives of the project were:

  • From the existing dataset, identify the ‘optimal' set(s) of M1-M4 (M1=Step 1, M2=Step 2, M3=Step 3, M4=Step 4) input variables that would meet particular defect probability and cost requirements

  • Predict the defect probabilities and costs of a new set of M1-M4 input variables

What I did:

  • Cleaned and merged the 5 large CSV files into a single data frame

  • Reviewed the data (scatter matrices and heat maps)

  • Performed Supervised and Unsupervised ML on the data: regressions (linear, polynomial, logistic), clustering, Gaussian Mixture Models, decision trees

  • Discovered the ‘optimal’ set(s) of input variables to optimise for low defect probabilities and low costs

  • Established appropriate models to predict defect probabilities and costs of any new set of input variables

Link to Github

More to come …