Predictive Maintenance Machine Learning Workbook
In this workbook, I'll be working with a synthetic predictive maintenance dataset from Kaggle. This dataset was created by UC Irvine to allow machine learning students to work with realistic predictive maintenance data, which is usually difficult to obtain.
This dataset has already been used widely to build machine learning models (there are 132 notebooks for this data on Kaggle), so I'll be taking a slightly different approach:
What maintenance rules & best practices can be learnt from the data and communicated to operators?
Here's my assumptions that lead me to pursue this research question:
- In the field, it may not always be possible to get reliable sensor data
- Operators of these machines (who are responsible for maintenance) likely do not have a machine learning background that allows them to effectively interpret complex model outputs
- Organisations may favour simple but reliable rules that can be recorded in operator manuals and workplace health & safety documentation
I'm thinking about this problem like a pre-shift inspection. I don't need a super complicated machine learning model to tell me to check oil levels before I start a piece of machinery. However, I would like to know that if it's really cold I need to check my oil twice a day, or in certain conditions machines are prone to specific failure-modes.
This approach introduces a unique challenge: I'm not just minimising loss, I'm also maximising explainability.
TL;DR
Most predictive maintenance notebooks on this dataset try to build the most accurate possible failure classifier. I'm doing something different — I want to know what the data can teach us about how to prevent failures in the first place.
The goal is rules an operator can actually use: replace the tool head before X minutes, don't run the machine above Y watts. No ML background required, nothing that can't be written into a manual.
Here's the approach in brief:
- EDA — understand the dataset, engineer two new features (mechanical power and temperature differential), and look for relationships between operating conditions and specific failure types.
- Logistic GLMs — train one binary model per failure type. GLMs give interpretable coefficients and p-values, so I can say with statistical confidence which features predict which failures — and in which direction.
- Threshold rules — use the GLM-confirmed predictors to find the operating point where failure rate crosses 1%. That becomes the maintenance rule threshold.
- ‘Is it worth it?’ analysis — for each rule, show the trade-off between failures caught and healthy machines unnecessarily flagged.
KEY FINDINGS
- Tool wear predicts two failure types almost perfectly. A scheduled replacement rule catches the vast majority of both.
- High mechanical power is a significant risk factor for heat dissipation and overstrain failures.
- Temperature differential is a monitoring signal for cooling system health.
- Power failures cannot be reliably predicted from operational data — no rule is proposed for them.
A Note on AI Use
I used an AI coding assistant (OpenCode — Claude Sonnet 4.6) to help write and debug code throughout this notebook.
I did all the thinking, Claude did most of the writing.
Any engineering decisions — from the research question to choice of models & parameters — and interpretations are mine. The thing I like most about AI is it frees me up to think more about these engineering decisions. As an example, I had never used SMOTE before this project. Instead of reading documentation on how to implement it, I spent my time understanding how it works and how it would help my models.
The other thing I love about AI is I got to put together this cool analysis in one (1!!) day.