Search code examples
databasemariadb

Optimizing MariaDB Query When Re-Querying Same Table


I am trying to run a query that will get the last time a customer has placed an order. The catch is that the site allows people the ability to checkout both logged in and logged out. Additionally, I want to combine records for customers that have changed their email at some point.

I have a query that works (I think), but it involves three selects - one for the last order, one for logged in/out switches, and one for email switches. This is kind of slow on my database! What am I overlooking? How can I make this query more efficient and/or faster?

SELECT last, email, user_id 
FROM (
  SELECT last, email, user_id 
  FROM (
    SELECT time as last, email, 
      CASE WHEN user_id > 0 THEN CAST(user_id as CHAR(20))
      ELSE email
      END as user_id
    FROM   orders o1
    WHERE  time=(SELECT MAX(o2.time)
              FROM orders o2
              WHERE o1.email = o2.email)
    ORDER BY last DESC
  ) as o3
  GROUP BY email
) as o4
GROUP BY user_id
ORDER BY email

Edit: don't know how this will help, but here's the orders table, as requested:

create table orders(order_id int(10) NOT NULL AUTO_INCREMENT, 
   user_id int(10) NOT NULL, 
   time int(10) UNSIGNED NOT NULL, 
   email varchar(50) NOT NULL);
 PRIMARY KEY (`order_id`), KEY `userid` (`user_id`))
 ENGINE=MyISAM AUTO_INCREMENT=6207584 DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci

Solution

  • Update1. In this example considered cases
    1.User has orders with logged in and logged out with the same email. 2. User has orders with logged in with several email.
    Index on ((email,user_id);) will be usefull for query.

    time column data type does not matter.

    Fiddle and

    select user_id,email,max_time
    from(
      select max(user_id) user_id,email ,max(time) max_time
        ,row_number()over(partition by max(user_id) order by max(time) desc) rn_email
      from orders
      group by email
     )t
    where rn_email=1 or user_id=0
    order by email
    

    Old part of answer:
    You can range orders by time desc with partition by composite value (user_id or email).
    To compare query performance, use 'EXPLAIN SELECT ...'
    See example:

    Test data

    order_id user_id time email
    1 100 2024-09-01 14:00:00 [email protected]
    2 100 2024-09-10 09:00:00 [email protected]
    3 200 2024-09-12 09:00:00 [email protected]
    4 300 2024-09-10 09:00:00 [email protected]
    5 300 2024-09-12 09:00:00 [email protected]
    6 400 2024-09-10 09:00:00 [email protected]
    7 400 2024-09-12 09:00:00 [email protected]
    8 400 2024-09-14 09:00:00 [email protected]
    9 0 2024-09-11 09:00:00 [email protected]
    10 0 2024-09-15 15:00:00 [email protected]
    11 0 2024-09-10 09:00:00 [email protected]

    Query example

    select *
    from(
      select *
        ,row_number()over(partition by case when user_id<>0 then CAST(user_id as varCHAR(50)) else email end 
                       order by time desc) rn
      from orders
    )ranged_orders
    where rn=1
    order by email
    

    Output

    order_id user_id time email rn
    2 100 2024-09-10 09:00:00 [email protected] 1
    3 200 2024-09-12 09:00:00 [email protected] 1
    5 300 2024-09-12 09:00:00 [email protected] 1
    8 400 2024-09-14 09:00:00 [email protected] 1
    10 0 2024-09-15 15:00:00 [email protected] 1
    11 0 2024-09-10 09:00:00 [email protected] 1

    fiddle

    I think it is necessary to bring user_id and email to the same data type - the email data type (varchar(50) in example).

    Another query

    select *
    from(
    select * ,row_number()over(partition by user_id order by time desc) rn
    from orders
    where user_id<>0
      union all
    select * ,row_number()over(partition by email order by time desc) rn
    from orders
    where user_id=0
    )ranged_orders
    where rn=1
    order by email
    

    possible (partially) covered by indexes. Example