Search code examples
sqlpostgresqljoinwhere-clause

Find Abandoned Searches from Customer Activities, SQL


In a trading platform, users have three different actions: LOGIN, SEARCH, ORDER.

Abandoned SEARCH action is defined as: customers LOGIN then do some SEARCH and do not ORDER before next LOGIN. Now I have a table: action_table, recording customer_id, action, request_time.

I want to know how to find all abandoned SEARCH actions via SQL?


Updated: Here is a brief example:

CREATE TABLE action_table(
   customer_id  VARCHAR(1) NOT NULL
  ,action       VARCHAR(6) NOT NULL
  ,request_time DATE  NOT NULL
);
INSERT INTO action_table(customer_id,action,request_time) 
VALUES 
('A','LOGIN','2023-05-01'),
('A','SEARCH','2023-05-02'),
('A','SEARCH','2023-05-03'),
('A','ORDER','2023-05-04'),
('B','LOGIN','2023-05-01'),
('B','SEARCH','2023-05-02'),
('B','SEARCH','2023-05-03'),
('B','LOGIN','2023-05-04'),
('B','SEARCH','2023-05-05')
customer_id action request_time
A LOGIN 2023-05-01
A SEARCH 2023-05-02
A SEARCH 2023-05-03
A ORDER 2023-05-04
B LOGIN 2023-05-01
B SEARCH 2023-05-02
B SEARCH 2023-05-03
B LOGIN 2023-05-04
B SEARCH 2023-05-05

In this case, NO abandoned SEARCH for A and 3 abandoned SEARCH for B.

Here is my code:

select customer_id, count(1)
from action_table c1
left join 
(
  select customer_id, action, request_time 
  from action_table
  where action = 'LOGIN'
) c2
  on c1.customer_id = c2.customer_id
  and c2.request_time > c1.request_time
left join 
(
  select customer_id, action, request_time 
  from action_table
  where action = 'ORDER'
) c3
  on c1.customer_id = c3.customer_id
  and c3.request_time > c1.request_time
  and c3.request_time < c2.request_time
where c1.action = 'SEARCH'
  and c2.customer_id IS NOT NULL
  and c3.customer_id IS NULL
group by 1

It is redundant and does not seem to work well :(


Solution

  • Here is my solution. This is not an optimal one but still works.

    The basic approach is to find the most recent LOGIN action after each SEARCH. Then check if ORDER exists between search time and most recent login time. Two subqueries are used :(

    SELECT *
    FROM action_table AS a
    WHERE a.action = 'SEARCH'
      AND NOT EXISTS (
        SELECT 1
        FROM action_table
        WHERE customer_id = a.customer_id
          AND action = 'ORDER'
          AND request_time > a.request_time
          AND request_time < (
            SELECT MIN(request_time)
            FROM action_table
            WHERE customer_id = a.customer_id
              AND action = 'LOGIN'
              AND request_time > a.request_time
          )
    );