I have been given a requirement to query 8(+) different Redshift tables from a Lambda function and return some of the fields where email = provided email address.
Current implementation is a JDBC connection using Spring Data JDBC :
@Query("select * from
(select product_name, email, firstname, lastname, accountid, phone,
from companyA) companyA
natural full join
(select product_name, email, firstname, lastname, accountid, phone,
from companyB) companyB
natural full join
(select product_name, email, firstname, lastname, accountid, phone,
from companyC) companyC
natural full join
(select product_name, email, firstname, lastname, accountid, phone,
from companyD) companyD
natural full join
(select product_name, email, firstname, lastname, accountid, phone,
from companyE) companyE
natural full join
(select product_name, email, firstname, lastname, accountid, phone,
from companyF) companyF
natural full join
(select product_name, email, firstname, lastname, accountid, phone,
from companyG) companyG
natural full join
(select product_name, email, firstname, lastname, accountid, phone,
from companyH) companyH
where email = :emailId;")
public List<Contact> getContactSortByDate(String emailId);
The problem with this approach is that each of these tables are quite large and so the query can take 45+ seconds. To make it worse, I will eventually need to implement a one to many join on "accountid" to 8 account tables to grab some fields there.
And... my boss wants all of this to happen in less than 5 seconds... I don't think that will be possible, but I wanted to know if there were things I can do to speed this search up as much as possible?
Questions:
EDIT: As requested, here is the Updated results of '''EXPLAIN```:
XN Subquery Scan alltables (cost=35.45..142252362.72 rows=35 width=520)
-> XN Append (cost=35.45..142252362.37 rows=35 width=1440)
-> XN Subquery Scan "*SELECT* 1" (cost=35.45..67200090.91 rows=11 width=1440)
-> XN Hash Join DS_BCAST_INNER (cost=35.45..67200090.80 rows=11 width=1440)
Hash Cond: (("outer".id)::text = ("inner".accountid)::text)
-> XN Seq Scan on companyA_accounts acc (cost=0.00..4.91 rows=491 width=816)
-> XN Hash (cost=35.43..35.43 rows=10 width=663)
-> XN Seq Scan on companyA usr (cost=0.00..35.43 rows=10 width=663)
Filter: ((email)::text = 'foobybar@barfoo.com'::text)
-> XN Subquery Scan "*SELECT* 2" (cost=41.85..41600165.37 rows=11 width=943)
-> XN Hash Join DS_BCAST_INNER (cost=41.85..41600165.26 rows=11 width=943)
Hash Cond: (("outer".id)::text = ("inner".accountid)::text)
-> XN Seq Scan on companyB_accounts acc (cost=0.00..10.96 rows=1096 width=550)
-> XN Hash (cost=41.83..41.83 rows=10 width=406)
-> XN Seq Scan on companyB usr (cost=0.00..41.83 rows=10 width=406)
Filter: ((email)::text = 'foobybar@barfoo.com'::text)
-> XN Subquery Scan "*SELECT* 3" (cost=512.54..6241501.45 rows=2 width=1374)
-> XN Hash Join DS_BCAST_INNER (cost=512.54..6241501.43 rows=2 width=1374)
Hash Cond: (("outer".id)::text = ("inner".accountid)::text)
-> XN Seq Scan on companyC_accounts acc (cost=0.00..439.50 rows=43950 width=771)
-> XN Hash (cost=512.54..512.54 rows=1 width=614)
-> XN Seq Scan on companyC usr (cost=0.00..512.54 rows=1 width=614)
Filter: ((email)::text = 'foobybar@barfoo.com'::text)
-> XN Subquery Scan "*SELECT* 4" (cost=2178.78..4403808.58 rows=3 width=985)
-> XN Hash Join DS_BCAST_INNER (cost=2178.78..4403808.55 rows=3 width=985)
Hash Cond: (("outer".id)::text = ("inner".accountid)::text)
-> XN Seq Scan on companyD_accounts acc (cost=0.00..501.46 rows=50146 width=809)
-> XN Hash (cost=2178.78..2178.78 rows=2 width=212)
-> XN Seq Scan on companyD usr (cost=0.00..2178.78 rows=2 width=212)
Filter: (((email)::text = 'foobybar@barfoo.com'::text) AND (accountid IS NOT NULL))
-> XN Subquery Scan "*SELECT* 5" (cost=3534.94..4244248.75 rows=2 width=511)
-> XN Hash Join DS_BCAST_INNER (cost=3534.94..4244248.73 rows=2 width=511)
Hash Cond: (("outer".id)::text = ("inner".accountid)::text)
-> XN Seq Scan on companyE_accounts acc (cost=0.00..219.62 rows=21962 width=347)
-> XN Hash (cost=3534.94..3534.94 rows=2 width=203)
-> XN Seq Scan on companyE usr (cost=0.00..3534.94 rows=2 width=203)
Filter: (((email)::text = 'foobybar@barfoo.com'::text) AND (accountid IS NOT NULL))
-> XN Subquery Scan "*SELECT* 6" (cost=810.77..1921107.33 rows=1 width=1175)
-> XN Hash Join DS_BCAST_INNER (cost=810.77..1921107.32 rows=1 width=1175)
Hash Cond: (("outer".id)::text = ("inner".accountid)::text)
-> XN Seq Scan on companyF_accounts acc (cost=0.00..131.80 rows=13180 width=1030)
-> XN Hash (cost=810.76..810.76 rows=1 width=184)
-> XN Seq Scan on companyF usr (cost=0.00..810.76 rows=1 width=184)
Filter: ((email)::text = 'foobybar@barfoo.com'::text)
-> XN Subquery Scan "*SELECT* 7" (cost=705.19..9200982.48 rows=3 width=1204)
-> XN Hash Join DS_BCAST_INNER (cost=705.19..9200982.45 rows=3 width=1204)
Hash Cond: (("outer".id)::text = ("inner".accountid)::text)
-> XN Seq Scan on companyG_accounts acc (cost=0.00..85.30 rows=8530 width=790)
-> XN Hash (cost=705.19..705.19 rows=2 width=449)
-> XN Seq Scan on companyG usr (cost=0.00..705.19 rows=2 width=449)
Filter: ((email)::text = 'foobybar@barfoo.com'::text)
-> XN Subquery Scan "*SELECT* 8" (cost=420.21..7440457.49 rows=2 width=1135)
-> XN Hash Join DS_BCAST_INNER (cost=420.21..7440457.47 rows=2 width=1135)
Hash Cond: (("outer".id)::text = ("inner".accountid)::text)
-> XN Seq Scan on companyH_accounts acc (cost=0.00..11.46 rows=1146 width=784)
-> XN Hash (cost=420.20..420.20 rows=2 width=362)
-> XN Seq Scan on companyH usr (cost=0.00..420.20 rows=2 width=362)
Filter: ((email)::text = 'foobybar@barfoo.com'::text)
Solution: As O.Jones pointed out, 'UNION ALL' sped it up by quite a bit! I still have a bunch of joins, but brought the time down by a lot. Here is the final solution:
SELECT *
FROM (
SELECT usr.product_name, acc.product_loc, acc.phone, usr.email, usr.firstname, usr.lastname, usr.accountid,
FROM companyA usr
JOIN companyA_accounts acc ON usr.accountid = acc.id
UNION ALL
SELECT usr.product_name, acc.product_loc, acc.phone, usr.email, usr.firstname, usr.lastname, usr.accountid,
FROM companyB usr
JOIN companyB_accounts acc ON usr.accountid = acc.id
/* UNION ALL SELECT repeated for the rest of the tables */
) alltables
WHERE alltables.email = :emailId;
I am still open to information on best practices / efficiency if switching to RedshiftClient though!
Have you tried something like this query, using an ordinary UNION ALL rather than all those JOINs?
SELECT *
FROM (
SELECT product_name, email, firstname, lastname, accountid, phone,
FROM companyA
UNION ALL
SELECT product_name, email, firstname, lastname, accountid, phone,
FROM companyB
/* UNION ALL SELECT repeated for the rest of the tables */
) alltables
JOIN some_other_table ON alltables.accountid = some_other_table.accountid
WHERE alltables.email = :emailId;
A query like this can exploit any indexes on the email
columns of any tables that have them.
And, as my example shows you can do the joins with fairly clean SQL.