Search code examples
sqlcountsummary

SQL - Identify if a user is present every month


I am performing some data analysis on users who have made transactions over the course of three months.

What I would like to do is identify customers who made specific transaction types (Credit) in every single month present in the data table over those two years. As you can see in the data table below, User A has performed a Credit transaction in months 1,2,3 and I would like a flag saying "Frequent" applied to the customer.

User B, however, has not performed a credit transaction every month (month 2 was Debit), and so I would like them to have a different flag name (e.g. "Infrequent").

How can I use SQL to identify if a user has made a specific transaction type each month?

| Date       | User |  Amount |  Transaction Type   |  **Flag **  |
| 2022-01-15 |   A  | $15.00  |       Credit        |  **Flag **  |
  ...
| 2022-02-15 |   A  | $15.00  |       Credit        |  **Flag **  |
  ...
| 2022-03-15 |   A  | $15.00  |       Credit        |  **Flag **  |
  ...
  ...
| 2022-01-15 |   B  | $15.00  |       Credit        |  **Flag **  |
  ...
| 2022-02-15 |   B  | $15.00  |       Debit         |  **Flag **  |
  ...
| 2022-03-15 |   B  | $15.00  |       Credit        |  **Flag **  |

I have tried the following - hoping there is a better or more simple way.

SELECT
   Date, User, Amount, Transaction_Type,
   CASE WHEN Count(present) = 3 THEN 'Frequent' ELSE 'Infrequent'

FROM Transactions

LEFT JOIN (
                SELECT 
                    User,Month(Date),Count(Transaction_Type) as present
                FROM 
                    Transactions
                WHERE 
                    Transaction_Type = 'Credit'
                GROUP BY 
                    User,Month(Date)
                Having 
                    Count(Transaction_Type) > 0
            ) subquery
                ON subquery.User = Transaction.User
GROUP BY    
        Date,User,Amount,Transaction_Type

Solution

  • That is the way I would approach it. Assuming you are using T-SQL I would make the following changes. Instead of having the LEFT JOIN be to a sub-query, I would make the sub-query a CTE and then joint to that. I find it easier to grok when the main query is not full of sub-queries and you can test the CTE on its own more easily, plus if performance becomes an issue is relatively trivial to convert the CTE to a temp table. without affecting the main query too much.

    You have a couple of problems I think. the first is that your subquery is going to return you the count of the credits in each month. If I make 3 credits in January this is going to flag me as frequent because the total is more than 3. You probably want to do a

    COUNT(DISTINCT Transaction_type) AS hasCredit
    

    to identify if there is AT LEAST ONE credit transaction, then have another aggregation that

    SUM(hasCredit)
    

    to get the number of months in which a credit appears.

    using nested sub-queries means your LEFT JOIN would now be two sub-queries deep and dissapearing off the right hand side of your screen. Writing them as CTEs keeps the main logic clean and script narrow.

    I think this does what you need, but can't test it because I don't have any sample data.

    WITH CTE_HasCredit AS
    (
        SELECT 
                User
                ,Month(Date) AS [TransactionMonth]
                ,Count(DISTINCT Transaction_Type) AS [hasCredit]
        FROM 
                Transactions
        WHERE 
                Transaction_Type = 'Credit'
        GROUP BY 
                User
                ,Month(Date)
        Having 
                Count(Transaction_Type) > 0
    ) 
    ,
    CTE_isFrequent AS 
    (
    
        SELECT 
                User
                ,SUM(hasCredit) AS [TotalCredits]
        FROM
                CTE_HasCredit
        GROUP BY
                User
    )
    
    SELECT
           TXN.Date
           , TXN.User
           , TXN.Amount
           , TXN.Transaction_Type
           ,CASE 
                WHEN FRQ.TotalCredits >= 3 THEN 'Frequent' 
                ELSE 'Infrequent'
            END AS [customerType]
    FROM 
            Transactions AS TXN
        LEFT JOIN 
            CTE_isFrequent AS FRQ ON FRQ.User = TXN.User
    GROUP BY    
            TXN.Date
            ,TXN.User
            ,TXN.Amount
            ,TXN.Transaction_Type
    

    I don't think you need the GROUP BY on the main query either; it would de-dupe transactions for the same day for the same amount. You might also want to look at the syntax for COUNT() OVER(). These would allow you to do the calculations in the main query and would look something like.

            ,CASE
                WHEN COUNT(DISTINCT TXN.Transaction_Type) OVER(PARTITION BY User, MONTH(TXN.Date),TXN.Transaction_Type) >=3 THEN 'Frequent'
                ELSE 'Infrequent' 
            END AS [customerType2]
    

    This second way would give you customer type for both the Debits and Credits. I am not aware of a way to filter the COUNT() OVER() to just Credits, for that you would need to use the CTE method.