I am experiencing slow (2-3min) returns from a simple query on an AWS RDS postgres materialized view. My working hypothesis is that this is a network transfer issue but I am not an expert in postgres. Some background on the setup:
When I run:
SELECT * FROM my_materialized_view where "COMPOUND" = 'cpd1234';
The query returns ~3-5seconds.
When I run:
SELECT * FROM my_materialized_view where "GENE" = 'geneXYZ';
The query takes ~2min to return 1M rows. The size of the returned data appears to 100-200MB in size.
Doing something slightly more "exotic":
SELECT * FROM my_materialized_view where "GENE" = 'geneXYZ' ORDER BY 'measured_value' LIMIT 10000;
returns data in about 12sec.
I also ran:
EXPLAIN ANALYZE SELECT * FROM my_materialized_view where "GENE" = 'geneXYZ';
the output seemed to indicate the process was about 500ms
"Index Scan using my_materialized_view_idx on my_materialized_view (cost=0.56..1415484.25 rows=922393 width=336) (actual time=0.039..437.332 rows=979434 loops=1)" " Index Cond: (""GENE"" = 'geneXYZ'::text)" "Planning time: 0.092 ms" "Execution time: 491.306 ms"
Am I reading this wrong or is the issue with the data transfer/network latency? I am using pgadmin for the queries, but also ran the query via python boto3 and equivalent times are realized. Appreciate your thoughts on this. Thanks!
Two facts about your situation make it clear that you do have slow data transfer, not a slow RDBMS server.
GENE
column, and your query plan says postgreSQL is using it.ORDER BY some_unindexed_column LIMIT some_number
, burdens postgreSQL with a sort. It returns about 1% of the data of your first query, and completes ten times faster. That means the sort costs something, but the transfer also costs something.