I have a table like this:
CREATE TABLE test (
ID SERIAL PRIMARY KEY,
user_id INT,
createdAt DATE,
status_id INT
);
INSERT INTO test VALUES
(1, 12, '2020-01-01', 4),
(2, 12, '2020-01-03', 7),
(3, 12, '2020-01-06', 7),
(4, 13, '2020-01-02', 5),
(5, 13, '2020-01-03', 6),
(6, 14, '2020-03-03', 8),
(7, 13, '2020-03-04', 4),
(8, 15, '2020-04-04', 7),
(9, 14, '2020-03-02', 6),
(10, 14, '2020-03-10', 5),
(11, 13, '2020-04-10', 8);
In that table there's id
for the id of each transaction, user_id
was the user, createdAt
was the date of transaction happen, and status_id
was the status for every transaction (in this case status_id
4, 5, 6, 8 are approved transaction)
I want to find out max, min, avg different day for every each transaction on every users who doing transaction between '2020-02-01' and '2020-04-01' with >1 transaction approved on that period
This is my query:
SELECT MIN(diff) AS `MIN`, MAX(diff) AS `MAX`, SUM(diff) / COUNT(DISTINCT user_id) AS `AVG`
FROM (
SELECT ID, user_id, DATEDIFF((SELECT t2.createdAt FROM test t2 WHERE t2.user_id = t1.user_id AND t1.createdAt <= t2.createdAt AND t2.id <> t1.id LIMIT 1), t1.createdAt) AS diff
FROM test t1
where
status_id in (4, 5, 6, 8)
HAVING SUM(t1.user_id BETWEEN '2020-02-01' AND '2020-04-01')
AND SUM(t1.user_id >= '2020-02-01') > 1
) DiffTable
WHERE diff IS NOT NULL
but it's said:
In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'fiddle_KDQIQDMUZEIOVXFHRZPY.t1.ID'; this is incompatible with sql_mode=only_full_group_by
what should i do?
expected result
+-----+-----+---------+
| MAX | MIN | AVERAGE |
+-----+-----+---------+
| 36 | 1 | 22 |
+-----+-----+---------+
explanation :
- the user_id who have approval transaction on 2020-02-01 until 2020-04-01 and user_id who have transaction more than 1 are user_id 13 & 14
- the maximum of different day on 2020-02-01 until 2020-04-01 are user_id 13 which the different day for each transaction happen in 2020-03-04 and doing next transaction again in 2020-04-10
- the minimum day of different day of each transaction are user_id 14 who doing transaction on 2020-03-02 and next transaction 2020-03-03
- average are 22 days (sum of different day on user_Id 13 & 14 / amount of user_id who fit on this condition)
You need to do the GROUP
ing outside of your subquery; the subquery should just be used to limit the selected transactions to those which have the desired status_id
value and dates within the required range. Then you can select users with more than one transaction in the period in the outer query:
SELECT user_id,
COUNT(*) AS transactions,
MIN(diff) AS `MIN`,
MAX(diff) AS `MAX`,
SUM(diff) / COUNT(diff) AS `AVG`
FROM (
SELECT user_id, DATEDIFF((SELECT MIN(t2.createdAt)
FROM test t2
WHERE t2.user_id = t1.user_id
AND t1.createdAt < t2.createdAt
AND t2.status_id in (4, 5, 6, 8)
), t1.createdAt) AS diff
FROM test t1
WHERE status_id in (4, 5, 6, 8)
AND createdAt BETWEEN '2020-02-01' AND '2020-04-01'
) DiffTable
WHERE diff IS NOT NULL
GROUP BY user_id
HAVING COUNT(*) > 1
Output (for your fiddle):
user_id transactions MIN MAX AVG
14 2 1 7 4.0000
If you want the values based on all transactions that occurred over that period, rather than by user_id
, you can simply remove the GROUP BY
and HAVING
clauses:
SELECT COUNT(*) AS transactions,
MIN(diff) AS `MIN`,
MAX(diff) AS `MAX`,
SUM(diff) / COUNT(diff) AS `AVG`
FROM (
SELECT user_id, DATEDIFF((SELECT MIN(t2.createdAt)
FROM test t2
WHERE t2.user_id = t1.user_id
AND t1.createdAt < t2.createdAt
AND t2.status_id in (4, 5, 6, 8)
), t1.createdAt) AS diff
FROM test t1
WHERE status_id in (4, 5, 6, 8)
AND createdAt BETWEEN '2020-02-01' AND '2020-04-01'
) DiffTable
WHERE diff IS NOT NULL
Output:
transactions MIN MAX AVG
3 1 37 15.0000
Note that there a couple of issues with the existing subquery in the DATEDIFF
computation: a LIMIT
without ORDER BY
is not guaranteed to give the expected results, and there is no conditioning on status_id
. I've fixed both those issues in my updated queries.