Search code examples
sqllanguage-agnosticprocedural

When do you give up set operations in SQL and go procedural?


I was once given this task to do in an RDBMS:

Given tables customer, order, orderlines and product. Everything done with the usual fields and relationships, with a comment memo field on the orderline table.

For one customer retrieve a list of all products that customer has ever ordered with product name, year of first purchase, dates of three last purchases, comment of the latest order, sum of total income for that product-customer combination last 12 months.

After a couple of days I gave up doing it as a Query and opted to just fetch every orderline for a customer, and every product and run through the data procedurally to build the required table clientside.

I regard this a symptom of one or more of the following:

  • I'm a lazy idiot and should have seen how to do it in SQL
  • Set operations are not as expressive as procedural operations
  • SQL is not as expressive as it should be

Did I do the right thing? Did I have other options?


Solution

  • You definitely should be able to do this exercise without doing the work equivalent to a JOIN in application code, i.e. by fetching all rows from both orderlines and products and iterating through them. You don't have to be an SQL wizard to do that one. JOIN is to SQL what a loop is to a procedural language -- in that both are fundamental language features that you should know how to use.

    One trap people fall into is thinking that the whole report has to be produced in a single SQL query. Not true! Most reports don't fit into a rectangle, as Tony Andrews points out. There are lots of rollups, summaries, special cases, etc. so it's both simpler and more efficient to fetch parts of the report in separate queries. Likewise, in a procedural language you wouldn't try do all your computation in a single line of code, or even in a single function (hopefully).

    Some reporting tools insist that a report is generated from a single query, and you have no opportunity to merge in multiple queries. If so, then you need to produce multiple reports (and if the boss wants it on one page, then you need to do some paste-up manually).

    To get a list of all products ordered (with product name), dates of last three purchases, and comment on latest order is straightforward:

    SELECT o.*, l.*, p.*
    FROM Orders o
     JOIN OrderLines l USING (order_id)
     JOIN Products p USING (product_id)
    WHERE o.customer_id = ?
    ORDER BY o.order_date;
    

    It's fine to iterate over the result row-by-row to extract the dates and comments on the latest orders, since you're fetching those rows anyway. But make it easy on yourself by asking the database to return the results sorted by date.

    Year of first purchase is available from the previous query, if you sort by the order_date and fetch the result row-by-row, you'll have access to the first order. Otherwise, you can do it this way:

    SELECT YEAR(MIN(o.order_date)) FROM Orders o WHERE o.customer_id = ?;
    

    Sum of product purchases for the last 12 months is best calculated by a separate query:

    SELECT SUM(l.quantity * p.price)
    FROM Orders o
     JOIN OrderLines l USING (order_id)
     JOIN Products p USING (product_id)
    WHERE o.customer_id = ?
     AND o.order_date > CURDATE() - INTERVAL 1 YEAR;
    

    edit: You said in another comment that you'd like to see how to get the dates of the last three purchases in standard SQL:

    SELECT o1.order_date
    FROM Orders o1
      LEFT OUTER JOIN Orders o2 
      ON (o1.customer_id = o2.customer_id AND (o1.order_date < o2.order_date 
          OR (o1.order_date = o2.order_date AND o1.order_id < o2.order_id)))
    WHERE o1.customer_id = ?
    GROUP BY o1.order_id
    HAVING COUNT(*) <= 3;
    

    If you can use a wee bit of vendor-specific SQL features, you can use Microsoft/Sybase TOP n, or MySQL/PostgreSQL LIMIT:

    SELECT TOP 3 order_date
    FROM Orders
    WHERE customer_id = ?
    ORDER BY order_date DESC;
    
    SELECT order_date
    FROM Orders
    WHERE customer_id = ?
    ORDER BY order_date DESC
    LIMIT 3;