Search code examples
machine-learninggoogle-cloud-platformautoml

How can I use AutoML in GCP to predict rare event?


As a title, I tried to use AutoML in Google Cloud Platform to predict some rare results. For example, suppose I have 5 types of independent variables: age, living area, income, family size, and gender. I want to predict a rare event called "purchase". Purchases are very rare, because for 10,000 data points, I will only get 3-4 purchases. Fortunately, I got loads more than just 10,000 data points. (I got 100 million data points)

I have tried to use AutoML to model the best combination, but since this is a rare result, the model only predicts for me that the number of purchases for all types of combinations in these 5 categories is 0. May I know how to solve this problem in AutoML?


Solution

  • In Cloud AutoML, the model predictions and the model evaluation metrics depend on the confidence threshold that is set. By default, in Cloud AutoML, the confidence threshold is 0.5. This value can be changed in the “Evaluate” tab of the “Models” section. To evaluate your model, change the confidence threshold to see how precision and recall are affected. The best confidence threshold depends on your use case. Here are some example scenarios to learn how evaluation metrics can be used. In your case, the recall metric has to be maximized (which would result in fewer false negatives) in order to correctly predict the purchase column.

    Also, the training data has to be composed of a comparable number of examples from each class in the target variable so that the model can predict values with a higher confidence. Since your training data is highly skewed, preprocessing of the data such as resampling has to be performed to handle the skewness.