I have created a flask api in python and deployed as a container image in gcp cloud run and running through the cloud scheduler, in my code i am reading large data (15 million rows and 20 columns) from big query, i have set my system config to 8gm ram 4 cpu.
problem1: It is taking too much time to read for about (2200 secs to read data)
import numpy as np
import pandas as pd
from pandas.io import gbq
query = """ SELECT * FROM TABLE_SALES"""
df = gbq.read_gbq(query), project_id="project_name")
Is there any efficient way to read the data from BQ?
Problem2 : my code has stopped working after reading the data. when i checked the logs, i got this:
error - 503
textPayload: "The request failed because either the HTTP response was malformed or connection to the instance had an error.
While handling this request, the container instance was found to be using too much memory and was terminated. This is likely to cause a new container instance to be used for the next request to this revision. If you see this message frequently, you may have a memory leak in your code or may need more memory. Consider creating a new revision with more memory."
one of the work around is to enhance the system config if that's the solution please let me know the cost around it.
You can try GCP Dataflow batch job to read through a large data from BQ.