I have a Python flask API that apply some SQL based filtering on an object.
Steps of the API workflow:
Contraints of the API:
Test made: While making some tests, I saw that I can make read queries in parallel. The test I did was:
from sqlalchemy import create_engine
import os
import time
engine = create_engine(
os.getenv("POSTGRES_URL")
)
def run_query():
with engine.connect() as conn:
rs = conn.execute(f"""
SELECT
*
, pg_sleep(5)
FROM users
""")
for row in rs:
print(row)
if __name__ == "__main__":
start = time.time()
for i in range(5):
run_query()
end = time.time() - start
from sqlalchemy import create_engine
import os
import threading
import time
engine = create_engine(
os.getenv("POSTGRES_URL")
)
def run_query():
with engine.connect() as conn:
rs = conn.execute(f"""
SELECT
*
, pg_sleep(5)
FROM users
""")
for row in rs:
print(row)
if __name__ == "__main__":
start = time.time()
threads = []
for i in range(5):
t = threading.Thread(target=run_query)
t.start()
threads.append(t)
for t in threads:
t.join()
end = time.time() - start
Question:
Thank you very much for your help !
This scales well beyond the point that is sensible. With some tweaks to the built in connection pool's pool_size, you could easily have 100 pg_sleep going simultaneously. But as soon as you change that to do real work rather than just sleeping, it would fall apart. You only have so many CPU and so many disk drives, and that number is probably way less than 100.
You should start by looking at those read queries to see why they are slow and if they can't be made faster with indices or something.