Search code examples
mysqlviewgreatest-n-per-group

Timetravel view in MySQL


I need to implement a timetravel view for prices in mysql. The base price table is this:

CREATE TABLE product_price (
  product_id INT(11) NOT NULL,
  date_valid TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  price DECIMAL(15,4) NOT NULL,
  PRIMARY KEY(product_id,date_valid)
);

The idea is that as time passes I select the right valid prices I have entered in advance. Maybe the concept can be clearer later. I need to create a view so that for each product_id I get the latest price. After a while I have found the SELECT that does what I need:

SELECT * FROM  (
  SELECT product_id,price FROM product_pric
    WHERE date_valid <= CURRENT_TIMESTAMP
    ORDER BY date_valid DESC
) xx GROUP BY product_id;

In order to create the needed view I understood I cannot use the subselect and need to create one or more intermediate views. Like this:

CREATE VIEW v_product_price_time AS
  SELECT product_id,price FROM product_pric
    WHERE date_valid <= CURRENT_TIMESTAMP
    ORDER BY date_valid DESC
;


CREATE VIEW v_product_price AS
  SELECT * FROM v_product_price_time GROUP BY product_id;

What I then get is not the sameas the original query I've written. For example, I populate the table with just two rows:

INSERT INTO product_price (product_id,date_valid,price ) VALUES ( 1,'2013-01-01',41.40 );
INSERT INTO product_price (product_id,date_valid,price ) VALUES ( 1,'2013-01-03',42.0 );

The raw query returns the right data, (1,42.0), but querying the view doesn't. I always get (1,41.40).

Surely I am missing something as I don't know MySQL very well. With another opensource RDBMS I have already done similar stuff, but now I need to cope with MySQL v5.5 and have no way to change it. But the documentation and a few searches in the developers forums didn't lead me to a solution. Any idea on how to solve this? TIA.


Solution

  • Use this query, regardless of the fact that is in a view or not.

    SELECT p1.* FROM (
      SELECT * FROM product_price
      WHERE date_valid <= CURRENT_TIMESTAMP
    ) p1
    LEFT JOIN (
      SELECT product_id, date_valid FROM product_price
      WHERE date_valid <= CURRENT_TIMESTAMP
    ) p2
    ON p1.product_id = p2.product_id AND p1.date_valid < p2.date_valid
    WHERE p2.date_valid IS NULL
    

    The query creates 2 derived tables, which is not quite efficient and is also a bit harder to read. You can try creating another view for that:

    CREATE VIEW product_price_past_dates AS (
      SELECT * FROM product_price
      WHERE date_valid <= CURRENT_TIMESTAMP
    );
    

    And then rewriting the original query as:

    SELECT p1.* FROM product_price_past_dates p1
    LEFT JOIN product_price_past_dates p2
    ON p1.product_id = p2.product_id AND p1.date_valid < p2.date_valid
    WHERE p2.date_valid IS NULL
    

    Then you can create the view on the query that uses the previous view:

    CREATE VIEW v_product_price_time AS (
      SELECT p1.* FROM product_price_past_dates p1
      LEFT JOIN product_price_past_dates p2
      ON p1.product_id = p2.product_id AND p1.date_valid < p2.date_valid
      WHERE p2.date_valid IS NULL
    );
    

    And finish with the simplest query:

    SELECT * FROM v_product_price_time;
    

    Fiddle here.

    Why isn't the GROUP BY not working: The error basically resides in an inappropriate use of a GROUP BY clause. The rule of thumb (although not 100% of the time true) would be to always use the same fields in the select as in the GROUP BY. Otherwise, MySQL will choose any value from the fields that are in the select and not in the GROUP BY.

    For further information you should check the MySQL documentation. I think it is pretty clear.

    Highly detailed explanation:

    SELECT * FROM  (
      SELECT product_id,price FROM product_pric
        WHERE date_valid <= CURRENT_TIMESTAMP
        ORDER BY date_valid DESC
    ) xx GROUP BY product_id;
    
    0 syntactical errors
    1 semantical error
    1 warning
    
    1. Semantical error: Selecting product_id and price and only grouping by product_id will result in an unpredictable price being returned for each price. You don't want to have an unpredictable value in your result set, otherwise, you wouldn't be selecting it. So you are 100% trusting in a value that you can't predict. That is indeed an error
    2. Warning: You are ordering a result set and then getting rid of that order by wrapping it inside a GROUP BY. There is no sense in ordering something and then generating a different order for it. That slows performance.

    Previous query should be fixed using one of the 2 basic greatest-n-per-group solutions. I've provided the shortest one which is the left join one.

    CREATE VIEW v_product_price_time AS
      SELECT product_id,price FROM product_pric
        WHERE date_valid <= CURRENT_TIMESTAMP
        ORDER BY date_valid DESC
    ;
    
    0 syntactical errors
    0 semantical errors
    0 warnings
    

    Perfectly valid query. No comments at all.

    CREATE VIEW v_product_price AS
      SELECT * FROM v_product_price_time GROUP BY product_id;
    
    0 syntactical errors
    1 semantical errors
    0 warnings
    
    1. Semantical error: Again, selecting product_id and price and only grouping by product_id. This will, the same as above, result in unpredictable results.

    So, you're basically comparing 2 unpredictable results and expecting to be the same. The funny thing is that the fact of comparing 2 unpredictable results is more error prone than just 1 unpredictable result. So consider yourself lucky of having that increase of chance in finding this error in your code. The guy who asked this question: How can I update the value of one field to the most often used value of another field?

    will discover a funny surprise when he finds out that the query was not working as he expected to be. Furthermore, 3 out of 4 answers in there are not properly grouping by correctly and returning unpredictable results. Not to mention Bohemian's comment in which he states his code will always work. So, congratulations Enzo, you've just divided by 0 :)

    Hope this helps.