Search code examples
machine-learninggoogle-cloud-platformgcp-ai-platform-notebook

Big-query predict using sk-learn model


I have created a sklearn model at my local machine. Then I have uploaded it on google storage. I have created a model and version in AI Platform using the same model. It is working for online prediction. Now I want to perform batch prediction and store the data into big query such as it updates big query table every time I perform the prediction.

Can someone suggest me how to do it?


Solution

  • AI Platform does not support writing prediction results to BigQuery at the moment.

    You can write the prediction results to BigQuery with Dataflow. There are two options here:

    1. Create Dataflow job that makes the predictions itself.
    2. Create Dataflow job that uses AI Platform to get the model's predictions. Probably this would use online predictions.

    In both cases you can define a BigQuery sink to insert new rows to your table.

    Alternatively, you can use Cloud Functions to update a BigQuery table whenever a new file appears in GCS. This solution would look like:

    1. Use gcloud to run the batch prediction (`gcloud ml-engine jobs submit prediction ... --output-path="gs://[My Bucket]/batch-predictions/"
    2. Results are written in multiple files: gs://[My Bucket]/batch-predictions/prediction.results-*-of-NNNNN
    3. Cloud function is triggered to parse and insert the results to BigQuery. This Medium post explains how to this up setup