model regression classification selection

Creating improved binary classifier using information from a semi-continuous target?

I'm working on a supervised binary classification problem for predictive maintenance that is phrased as the following question: "What's the probability that this piece of equipment will fail in the next N months?"

I have a dataset of continuous and categorical features that are taken at a single point in time. The status of that machine was then tracked over a period of time to see if it had any failures. From this, my target is either a numerical value (the time of failure in months) or a null (it didn't fail).

Currently, I'm modeling this as a pure binary classification - 0 if it failed > N months or didn't fail and 1 if it failed < N months. Then, I train a model that has a calibrated probability output and I'm done. But intuitively, I feel that there must be a way to include the actual numerical information of the failure date to help improve the probability prediction. Should I try to reframe this as a regression problem? If so, how do I handle the null values (where it didn't fail)?

Cheers!

Solution

You can use Survival regression by implementing for instance an Accelerated Failure Time (AFT) model. Here are a couple of examples:

the Weibull AFT model in Python
the Weibull AFT model in R