Cost/Defect Optimisation of a Manufacturing Process Using a Large Dataset
#Python, #MachineLearning, #Statistics
This academic project required analysis of a large dataset containing 4 input variables and 2 output variables (a defect variable and a cost variable) for a new 4-step manufacturing process (each step had 4 input variables - Temperature, Vibration, Arc Gap, Pressure - as well as the aforementioned 2 output variables).
The objectives of the project were:
From the existing dataset, identify the ‘optimal' set(s) of M1-M4 (M1=Step 1, M2=Step 2, M3=Step 3, M4=Step 4) input variables that would meet particular defect probability and cost requirements
Predict the defect probabilities and costs of a new set of M1-M4 input variables
What I did:
Cleaned and merged the 5 large CSV files into a single data frame
Reviewed the data (scatter matrices and heat maps)
Performed Supervised and Unsupervised ML on the data: regressions (linear, polynomial, logistic), clustering, Gaussian Mixture Models, decision trees
Discovered the ‘optimal’ set(s) of input variables to optimise for low defect probabilities and low costs
Established appropriate models to predict defect probabilities and costs of any new set of input variables