google-cloud-functions google-cloud-sql google-cloud-run

Best practice for connecting to Google Cloud SQL instances from GCF

I"m looking for best practices on how best to setup database access to cloud Sql from my GCF functions. I am writing an application in which I heavily use Google Cloud Functions for business logic and am using Google Cloud SQL (Postgres) for the back-end database. The databases are sharded on multiple instances of the postges servers.

In regards to how best to connect, it seems like there are two options.

GCF does allow direct access to Cloud SQL. In this case, it appears I could create a connection pool for the function that I use to execute the queries, but since I don't know how long the function would remain in memory, it's unclear how many connections to configure. I'm not sure how this would work? I know each function uses 1 connection at a time, but how would this scale? If I'm getting large number of queries, does this become a bottle neck? For best efficiency, I definitely want to use some kind of connection pooling.
I could create a dedicated http server (I use node.js) with postgres client library and have the GCF functions call the http server which in turn calls Cloud SQL. This server could handle connection pooling and be running 24/7 and seems to be more natural in scaling for larger number of queries. However, it does add an extra network call in the process. I could run this in Google Cloud Run or setup via app-engine.

I'm looking for what would be best practice that would provide best performance and would scale as my app scales.

I have tried to research this online, but haven't not found any real recommendations or articles on what is the best...just "how to" articles.

Any feedback is appreciated.

Solution

Here my advice on your proposals

A function can process only 1 request at a time. According with your code design, but if you aren't manage the database request concurrently, you need a pool of 1 connection in your function. (If your code manage up to 10 concurrent request to the database, create a pool of 10). Then, put this pool in a global variable. The first time that you use it, initialize the pool (like this you perform a lazy loading, better than a eager loading that decrease the cold start) and use it. Each time that a Cloud Functions instance is created, the pool is initialized. Each time that the instance is destroyed, the pool is closed. If the database connexion is a bottleneck, you can use the max instances parameters on Cloud Functions to preserve your database.
It's not a good idea. You add a new layer, a new potential point of failure, you need to maintain, update, patch this VM,... I don't recommend you to do this.

I hope I have answered your question. If not, comment it