I have a table below and I am trying to determine the number of users that have spent money at the company for the first time each month.
What I would like is to have a resulting table with new users, month and year as columns.
Before people downvote this post, I have looked through various posts already and can't seem to find a similar way to solve this issue. The code that I have included below is based on what I've managed to piece together from relevant posts.
This is the original table:
+---------------------+-------------+-----------------+
| datetime | customer_id | amount |
+---------------------+-------------+-----------------+
| 2018-03-01 03:00:00 | 3786 | 14 |
+---------------------+-------------+-----------------+
| 2018-03-02 17:00:00 | 5678 | 25 |
+---------------------+-------------+-----------------+
| 2018-08-17 19:00:00 | 5267 | 45 |
+---------------------+-------------+-----------------+
| 2018-08-25 08:00:00 | 3456 | 78 |
+---------------------+-------------+-----------------+
| 2018-08-25 17:00:00 | 3456 | 25 |
+---------------------+-------------+-----------------+
| 2019-05-25 14:00:00 | 3456 | 15 |
+---------------------+-------------+-----------------+
| 2019-07-02 14:00:00 | 88889 | 45 |
+---------------------+-------------+-----------------+
| 2019-08-25 08:00:00 | 1234 | 88 |
+---------------------+-------------+-----------------+
| 2019-08-30 09:31:00 | 1234 | 30 |
+---------------------+-------------+-----------------+
| 2019-08-30 12:00:00 | 9876 | 55 |
+---------------------+-------------+-----------------+
| 2019-09-01 13:00:00 | 88889 | 23 |
+---------------------+-------------+-----------------+
This is the CREATE statement:
CREATE TABLE IF NOT EXISTS `spend` ( `datetime` datetime NOT NULL, `customer_id` int(11) NOT NULL, `amount` int(11) NOT NULL, PRIMARY KEY (`datetime`)) DEFAULT CHARSET=utf8mb4;
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-03-01 03:00:00', 3786, 14);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-03-02 17:00:00', 5678, 25);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-17 19:00:00', 5267, 45);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-25 08:00:00', 3456, 78);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2018-08-25 17:00:00', 3456, 25);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-05-25 14:00:00', 3456, 15);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-07-02 14:00:00', 88889, 45);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-08-25 08:00:00', 1234, 88);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-08-30 09:31:00', 1234, 30);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-08-30 12:00:00', 9876, 55);
INSERT INTO `spend` (`datetime`, `customer_id`, `amount`) VALUES ('2019-09-01 13:00:00', 88889, 23);
This is the code that I have come up with:
SELECT S.datetime, S.customer_id, S.amount
FROM spend S
INNER JOIN
(SELECT customer_id, MIN(datetime) AS first_occurence
FROM spend
GROUP BY customer_id) X
ON S.customer_id = X.customer_id AND S.datetime = X.first_occurence
This is the resulting table:
+------------------+-------------+-------+
| datetime | customer_id |amount |
+------------------+-------------+-------+
| 01/03/2018 03:00 | 3786 | 14 |
+------------------+-------------+-------+
| 02/03/2018 17:00 | 5678 | 25 |
+------------------+-------------+-------+
| 17/08/2018 19:00 | 5267 | 45 |
+------------------+-------------+-------+
| 25/08/2018 08:00 | 3456 | 78 |
+------------------+-------------+-------+
| 02/07/2019 14:00 | 88889 | 45 |
+------------------+-------------+-------+
| 25/08/2019 08:00 | 1234 | 88 |
+------------------+-------------+-------+
| 30/08/2019 12:00 | 9876 | 55 |
+------------------+-------------+-------+
This is what an example of what the table should look like:
+-----------+-------+------+
| new_users | month | year |
+-----------+-------+------+
| 2 | 3 | 2018 |
+-----------+-------+------+
| 3 | 8 | 2018 |
+-----------+-------+------+
| 1 | 5 | 2019 |
+-----------+-------+------+
| 1 | 7 | 2019 |
+-----------+-------+------+
| 3 | 8 | 2019 |
+-----------+-------+------+
| 1 | 9 | 2019 |
+-----------+-------+------+
You started correctly. Now use that as a subquery to get the count by month.
SELECT COUNT(*) AS new_users, MONTH(datetime) AS month, YEAR(datetime) AS year
FROM (
SELECT S.datetime, S.customer_id, S.amount
FROM spend S
INNER JOIN
(SELECT customer_id, MIN(datetime) AS first_occurence
FROM spend
GROUP BY customer_id) X
ON S.customer_id = X.customer_id AND S.datetime = X.first_occurence
) AS x
GROUP BY month, year
ORDER BY year, month
Actually, you don't even need the join in the subquery, since you're not using the amount from the first purchase in the final result.
SELECT COUNT(*) AS new_users, MONTH(datetime) AS month, YEAR(datetime) AS year
FROM (
SELECT customer_id, MIN(datetime) AS datetime
FROM spend
GROUP BY customer_id
) AS x
GROUP BY month, year
ORDER BY year, month