Search code examples
sqlperformancepostgresqlsubquerypostgresql-performance

Join/Subquery is fast/slow depending on which column I filter on (not a simple index issue)


PostgreSQL 9.3.2, compiled by Visual C++ build 1600, 64-bit

Each customer can have many orders and referrals. Now, I want to create a view with some statistics for customer where, for each customer, I have some calculated columns (one row for each customer).

Create the view:

create view myview
select
    a.customer_id,
    sum(a.num) as num_orders,
    sum(b.num) as num_referrals
from
    (
    select
        customer.id as customer_id,
        count(customer.id) as num
    from
        customer
    left join
        order
    on
        order.customer_id = customer.id
    group by
        customer.id
    ) a
left join
    (
    select
        customer.id as customer_id,
        count(customer.id) as num
    from
        customer
    left join
        referral
    on
        referral.customer_id = customer.id
    group by
        customer.id
    ) b
on
    a.customer_id = b.customer_id
group by
    a.customer_id,
    b.customer_id
;

Query A (this is fast):

select
    customer.*,
    myview.*
from
    customer
left join
    myview
on
    customer.id = myview.customer_id
where
    customer.id = 100
;

Query B (this is SLOW):

select
    customer.*,
    myview.*
from
    customer
left join
    myview
on
    customer.id = myview.customer_id
where
    customer.sex = 'M'
;

Query C (this is fast):

select
    customer.*,
    myview.*
from
    customer
left join
    myview
on
    customer.id = myview.customer_id
where
    (select id from customer where sex = 'M')
;

OK, so why is Query B so much different than Query A in terms of performance? I guess, in Query B, it is running those subqueries first without filtering, but I don't know how to fix it.

The problem is that it is our ORM that is generating the query. So, I can't fix the problem by doing something like Query C.
I'm hoping there's just a better way to design my view to fix the problem. The main difference in the EXPLAIN results between Query A and Query B is that Query B has some MERGE RIGHT JOIN operations.
Any ideas?


EDIT:
I added following information per requests from people commenting. The following is the more true-to-life info (as opposed to the simplified, hypothetical scenario above).

create or replace view myview as
select
    a.id_worder,
    count(a.*) as num_finance_allocations,
    count(b.*) as num_task_allocations
from
    (
    select
        woi.id_worder,
        count(*) as num
    from
        worder_invoice woi
    left join
        worder_finance_task ct
    on
        ct.id_worder_finance = woi.id
    left join 
        worder_finance_task_allocation cta
    on
        cta.id_worder_finance_task = ct.id
    group by
        woi.id_worder
    ) a
left join
    (
    select
        wot.id_worder,
        count(*) as num
    from
        worder_task wot
    left join
        worder_task_allocation wota
    on
        wota.id_worder_task = wot.id
    group by
        wot.id_worder
    ) b
on
    a.id_worder = b.id_worder
group by
    a.id_worder,
    b.id_worder
;

Query A (fast, apparently I need a rep of more than 10 to post more than 2 links, so no EXPLAIN for this one)

select 
    *
from 
    worder a 
left outer join 
    myview b 
on 
    a.id = b.id_worder 
where 
    a.id = 100
;

Query B (SLOW, EXPLAIN)

select 
    *
from 
    worder a 
left outer join 
    myview b 
on 
    a.id = b.id_worder 
where 
    a.id_customer = 200

Query C (fast, EXPLAIN)

select 
    *
from 
    worder a 
left outer join 
    myview b 
on 
    a.id = b.id_worder 
where 
    a.id = (select id from worder where id_customer = 200)
;

Solution

  • Try rewriting your view like so:

    create view myview
    select
        c.customer_id,
        (
            select count(*) from order o where o.customer_id=c.customer_id
        ) num_orders,
        (
            select count(*) from referral r where r.customer_id=c.customer_id
        )
        from customer c ;