I have developed a Cloud Run service using Python with Flask that accepts incoming HTTP requests. Once a request is made, the payload is send to a class with a context manager. In that class, upon entering, a database connection is established. Upon exiting, it is closed, to ensure there's no connection leaks.
Here is how the database connection is set up in the class's enter method:
if self.db_connection is None:
try:
self.db_connection = psycopg2.connect(
host=
dbname=
user=
password=
)
self.db_cursor = self.db_connection.cursor()
except Exception as e:
print(f'Error connecting to database: {e}')
return
We are using Google Cloud SQL with a PostgreSQL V15.2 instance, connecting using the psycopg2 Python library. Because of this, there's a maximum number of active connections per database which is 100. Therefore we limit the amount of active HTTP requests/tasks to less than that, around 80, which is making the workflow slower.
There are definitely better ways, but I am not very familiar with them yet. One such way seems to be connection pooling, but I'm not yet sure about the best way to go about it. Because we have multiple active instances and connect to the database during a 'task', if we create a connection pool, won't multiple pools be created with each having a number of reserved connections? How would we go about making a single pool for, say, 80 connections that are re-used? Would a single pool with 80 connections then take up 80% of the max_connections per database for the Cloud SQL database?
What's the best approach in this situation? We'll probably reconfigure the database to increase the MAX_CONNECTIONS parameter, but it's probably also a good idea to change the way we connect to it at the same time.
We haven't tried a different way of connecting, but the possible solutions/information I've discovered so far are:
I would definitely recommended leveraging a connection pool for your use-case to take advantage of re-using connections and reduce overhead with upwards of 100,000 connections. A popular connection pooling library for Python that supports psycopg2 is SQLAlchemy.
You could establish a global connection pool similar to your code above that could be shared across class objects.
The code for SQLAlchemy to establish a psycopg2 connection engine (pool) is as follows:
import sqlalchemy
pool = None
def init_connection_pool(user, password, ip_address, dbname):
return sqlalchemy.create_engine(
# Equivalent URL:
# postgresql+psycopg2://<db_user>:<db_pass>@<db_host>:<db_port>/<db_name>
sqlalchemy.engine.url.URL.create(
drivername="postgresql+psycopg2",
username=user,
password=password,
host=ip_address,
port=5432,
database=dbname,
),
# requiring SSL is recommended for Private IP connections
connect_args={'sslmode': 'require'},
# Pool size is the maximum number of permanent connections to keep.
pool_size=10,
# Temporarily exceeds the set pool_size if no connections are available.
max_overflow=2,
# The total number of concurrent connections for your application will be
# a total of pool_size and max_overflow.
# 'pool_timeout' is the maximum number of seconds to wait when retrieving a
# new connection from the pool. After the specified amount of time, an
# exception will be thrown.
pool_timeout=30, # 30 seconds
# 'pool_recycle' is the maximum number of seconds a connection can persist.
# Connections that live longer than the specified amount of time will be
# re-established
pool_recycle=1800, # 30 minutes
)
# ... inside your endpoint code
global pool
if pool is None:
pool = init_connection_pool("YOUR-USER", "YOUR-PASsWORD", "CLOUD-SQL-PRIVATE-IP", "YOUR-DB")
# acquire connection from pool
with pool.connect() as db_conn:
# ...
I have just set some defaults for configuring the pool and they can easily be tweaked as you see fit.