I am trying to build a table that would bring be a combination of all products that I could sell, based on the current ones.
Product Status Table
+-------------+--------------+----------------+
| customer_id | product_name | product_status |
+-------------+--------------+----------------+
| 1 | A | Active |
| 2 | B | Active |
| 2 | C | Active |
| 3 | A | Cancelled |
+-------------+--------------+----------------+
Now I am trying to cross join with a hard code table that would give be 4 rows per customer_id, based on all 4 product we have in our portfolio, and statuses that I would like to apply.
Portfolio Table
+--------------+------------+----------+
| product_name | status_1 | status_2 |
+--------------+------------+----------+
| A | Inelegible | Inactive |
| B | Inelegible | Inactive |
| C | Ineligible | Inactive |
| D | Inelegible | Inactive |
+--------------+------------+----------+
On my code I tried to use a CROSS JOIN in order to achieve 4 rows per customer_id. Unfortunately, for customers that have more than one product, I have double/triple rows.
This is my code:
SELECT
p.customer_id,
CASE WHEN p.product_name = pt.product_name THEN p.product_name ELSE pt.product_name END AS product_name,
CASE
WHEN p.product_name = pt.product_name THEN p.product_status
ELSE pt.status_1
END AS product_status
FROM
products AS p
CROSS JOIN
portfolio as pt
This is my current output:
+----+-------------+--------------+----------------+
| # | customer_id | product_name | product_status |
+----+-------------+--------------+----------------+
| 1 | 1 | A | Active |
| 2 | 1 | B | Inelegible |
| 3 | 1 | C | Inelegible |
| 4 | 1 | D | Inelegible |
| 5 | 2 | A | Ineligible |
| 6 | 2 | A | Ineligible |
| 7 | 2 | B | Active |
| 8 | 2 | B | Ineligible |
| 9 | 2 | C | Active |
| 10 | 2 | C | Ineligible |
| 11 | 2 | D | Ineligible |
| 12 | 2 | D | Ineligible |
| 13 | 3 | A | Cancelled |
| 14 | 3 | B | Ineligible |
| 15 | 3 | C | Ineligible |
| 16 | 3 | D | Ineligible |
+----+-------------+--------------+----------------+
As you may see, for the customer_id 2, I have two rows for each product having products B and C with different statuses then what I have on the product_status table.
What I would like to achieve, in this case, is a table with 12 rows, in which the current product/status from the product_status table is shown, and the remaining product/statuses from the portfolio table are added.
Expected output
+----+-------------+--------------+----------------+
| # | customer_id | product_name | product_status |
+----+-------------+--------------+----------------+
| 1 | 1 | A | Active |
| 2 | 1 | B | Inelegible |
| 3 | 1 | C | Inelegible |
| 4 | 1 | D | Inelegible |
| 5 | 2 | A | Ineligible |
| 6 | 2 | B | Active |
| 7 | 2 | C | Active |
| 8 | 2 | D | Ineligible |
| 9 | 3 | A | Cancelled |
| 10 | 3 | B | Ineligible |
| 11 | 3 | C | Ineligible |
| 12 | 3 | D | Ineligible |
+----+-------------+--------------+----------------+
Not sure if the CROSS JOIN is the best alternative, but now I am running out of ideas.
EDIT:
I thought of another cleaner solution. Do a cross join first, then a right join on the customer_id and product_name, and coalesce the product statuses.
SELECT customer_id, product_name, coalesce(product_status, status_1)
FROM products p
RIGHT JOIN (
SELECT *
FROM (SELECT DISTINCT customer_id FROM products) pro
CROSS JOIN portfolio
) pt
USING (customer_id, product_name)
ORDER BY customer_id, product_name
Old answer: The idea is to include information of all product names for a customer_id into a list, and check whether the product in portfolio is in that list.
(SELECT customer_id, pt_product_name as product_name, first(status_1) as product_status
FROM (
SELECT
customer_id,
p.product_name as p_product_name,
pt.product_name as pt_product_name,
product_status,
status_1,
status_2,
collect_list(p.product_name) over (partition by customer_id) AS product_list
FROM products p
CROSS JOIN portfolio pt
)
WHERE NOT array_contains(product_list, pt_product_name)
GROUP BY customer_id, product_name)
UNION ALL
(SELECT customer_id, p_product_name as product_name, first(product_status) as product_status
FROM (
SELECT
customer_id,
p.product_name as p_product_name,
pt.product_name as pt_product_name,
product_status,
status_1,
status_2,
collect_list(p.product_name) over (partition by customer_id) AS product_list
FROM products p
CROSS JOIN portfolio pt)
WHERE array_contains(product_list, pt_product_name)
GROUP BY customer_id, product_name)
ORDER BY customer_id, product_name;
which gives
+-----------+------------+--------------+
|customer_id|product_name|product_status|
+-----------+------------+--------------+
| 1| A| Active|
| 1| B| Inelegible|
| 1| C| Ineligible|
| 1| D| Inelegible|
| 2| A| Inelegible|
| 2| B| Active|
| 2| C| Active|
| 2| D| Inelegible|
| 3| A| Cancelled|
| 3| B| Inelegible|
| 3| C| Ineligible|
| 3| D| Inelegible|
+-----------+------------+--------------+
FYI the chunk before UNION ALL
gives:
+-----------+------------+--------------+
|customer_id|product_name|product_status|
+-----------+------------+--------------+
| 1| B| Inelegible|
| 1| C| Ineligible|
| 1| D| Inelegible|
| 2| A| Inelegible|
| 2| D| Inelegible|
| 3| B| Inelegible|
| 3| C| Ineligible|
| 3| D| Inelegible|
+-----------+------------+--------------+
And the chunk after UNION ALL
gives:
+-----------+------------+--------------+
|customer_id|product_name|product_status|
+-----------+------------+--------------+
| 1| A| Active|
| 2| B| Active|
| 2| C| Active|
| 3| A| Cancelled|
+-----------+------------+--------------+
Hope that helps!