I have three tables:
Value v
+-------------------------------+
| Owner | Location | Value |
+-------------------------------+
|Bob | A1 | 0.25 |
|Bob | B4 | 0.10 |
|Dale | Z3 | 0.50 |
|Dale | A1 | 0.25 |
|Rick | B4 | 0.10 |
|Rick | Z3 | 0.50 |
+-------------------------------+
PurchasePercentage p
+-------------------------------+
| Buyer | Location | Percentage |
+-------------------------------+
|Bill | A1 | 0.10 |
|Bill | B4 | 0.20 |
|Kyle | Z3 | 0.30 |
|Kyle | A1 | 0.50 |
|Jan | B4 | 0.15 |
|Jan | Z3 | 0.25 |
+-------------------------------+
Buyout b
+------------------+
| Owner | Buyer |
+------------------+
|Bob | Bill |
|Bob | Kyle |
|Dale | Jan |
|Dale | Bill |
|Rick | Kyle |
|Rick | Jan |
+------------------+
What I'm looking for is a fourth table:
PossibleBuyouts
+--------------------------------+
| Owner | Buyer | BuyoutCost |
+--------------------------------+
based on the transactions laid out in the Buyout
table
where BuyoutCost
is the SUM of v.Value
* p.Percentage
for each distinct Location
the Buyer
and Owner
have in common.
So these examples would return a PossibleBuyouts
table of
+--------------------------------+
| Owner | Buyer | BuyoutCost |
+--------------------------------+
|Bob | Bill | 0.045 |
|Bob | Kyle | 0.125 |
|Dale | Jan | 0.125 |
|Dale | Bill | 0.025 |
|Rick | Kyle | 0.150 |
|Rick | Jan | 0.140 |
+--------------------------------+
Laying out the first row as an example, the math works out like this:
-Bob and Bill both have Locations A1 and B4 in common
-Bob's ownership value for A1 is 0.25 and Bill wants 0.10 percent
-the cost for A1 would be (0.25*0.10) = 0.025
-Bob's ownership value for B4 is 0.10 and Bill wants 0.20 percent
-the cost for B4 would be (0.10*0.20) = 0.020
-Sum(0.025 + 0.020) = the BuyoutCost of 0.045.
If you could help me out, I'm looking for the most efficient way to do this - whether it's multiple queries, one query with a bunch of subqueries, whatever should take the least amount of time.
There are ~1500 different owners, ~1000 different buyers and ~500 different locations, so the number of potential combinations leads to looooooong query times. What do you think will be the fastest way to do this? Everything is indexed and the data pared down to the smallest size I can get it to.
I recommend that you create a procedure that uses a cursor to accomplish your goal. Iterating over one of the three tables, it should be a pretty straightforward process.
MySQL Cursors: http://dev.mysql.com/doc/refman/5.0/en/cursors.html
Edit: The following stored MySQL script creates and calls a stored procedure that correctly reproduces the results you're looking for:
DELIMITER $$
DROP PROCEDURE IF EXISTS `computeBuyouts` $$
CREATE PROCEDURE `computeBuyouts` ()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE o, b VARCHAR(32);
DECLARE bc FLOAT;
DECLARE cur CURSOR FOR SELECT Owner,Buyer FROM Buyout;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur;
read_loop: LOOP
FETCH cur INTO o, b;
IF done THEN
LEAVE read_loop;
END IF;
SET bc = 0;
-- compute the sum of products of each value/percentage pair for all locations
SELECT SUM(v.Value * pp.Percentage) INTO bc FROM Value v LEFT JOIN (PurchasePercentage pp) ON (v.Location = pp.Location) where v.Owner = o and pp.Buyer = b group by v.Owner;
INSERT INTO PossibleBuyouts VALUES (o, b, bc);
END LOOP;
CLOSE cur;
END $$
DELIMITER ;
CALL computeBuyouts();
Hope that helps. At the table sizes you mentioned, this should complete extremely quickly.