Search code examples
sqlpostgresqlherokuheroku-postgres

RFM analysis with Postgresql


I am trying to create RFM analysis with Postgresql query. However, i haven't fully completed the query for the Recency dimension. The query inspired in this article "https://cooldata.wordpress.com/2014/03/25/an-all-sql-way-to-automate-rfm-scoring/ "
The criteria of the Recency dimension was

  1. Last order within 2 months = 5
  2. Last order within 4 months = 4
  3. Last order within 6 months = 3
  4. Last order within 8 months = 2
  5. Last order within 10 months = 1

and below is the query i've been trying to finish

WITH rfm AS

(SELECT email,
 SUM((total_incl_tax)) AS cash,
 MAX(decode(order_order.order_date, 2016-01-01, 5, 2016-02-01, 4, 2016-03-01, 3, 2016-04-01, 2, 201605-01, 1)) AS recency,
 COUNT(DISTINCT(order_date)) AS frequency
 FROM order_order

 GROUP BY email)


SELECT rfm.email,
CASE
WHEN rfm.cash >= 2000000 THEN 5
WHEN rfm.cash > 1500000 THEN 4
WHEN rfm.cash > 1000000 THEN 3
WHEN rfm.cash > 500000 THEN 2
WHEN rfm.frequency > 4 THEN 5
WHEN rfm.frequency = 4 THEN 4
WHEN rfm.frequency = 3 THEN 3
WHEN rfm.frequency = 2 THEN 2
WHEN rfm.frequency = 1 THEN 1
else 1

END  + rfm.frequency AS rfm_score
--+ Five_years.recency

FROM rfm
GROUP BY rfm.email, rfm.cash,rfm.frequency
ORDER BY rfm.email

The error was :

ERROR: function decode(timestamp with time zone, integer, integer, integer, integer, integer, integer, integer, integer, integer, integer) does not exist Hint: No function matches the given name and argument types. You might need to add explicit type casts. Position: 186

and i assume the error was in this line

MAX(decode(order_order.order_date, 2016-01-01, 5, 2016-02-01, 4, 2016-03-01, 3, 2016-04-01, 2, 2016-05-01, 1)) AS recency

Are there any suggestion in modifying the error line into the criteria stated for the recency dimension? thanks


Solution

  • There is no decode() function in Postgres. You can substitute this with another CASE statement:

    WITH rfm AS
    (
         SELECT email,
         SUM((total_incl_tax)) AS cash,
         MAX(
             CASE
              WHEN order_order.order_date = '2016-01-01' THEN 5
              WHEN order_order.order_date = '2016-02-01' THEN 4
              WHEN order_order.order_date = '2016-03-01' THEN 3
              WHEN order_order.order_date = '2016-04-01' THEN 2
              WHEN order_order.order_date = '2016-05-01' THEN 1
             END
            )   as recency,
         COUNT(DISTINCT(order_date)) AS frequency
         FROM order_order
         GROUP BY email
     )
    SELECT rfm.email,
    CASE
        WHEN rfm.cash >= 2000000 THEN 5
        WHEN rfm.cash > 1500000 THEN 4
        WHEN rfm.cash > 1000000 THEN 3
        WHEN rfm.cash > 500000 THEN 2
        WHEN rfm.frequency > 4 THEN 5
        WHEN rfm.frequency = 4 THEN 4
        WHEN rfm.frequency = 3 THEN 3
        WHEN rfm.frequency = 2 THEN 2
        WHEN rfm.frequency = 1 THEN 1
        else 1
    END  + rfm.frequency + rfm.recency AS rfm_score
    FROM rfm
    GROUP BY rfm.email, rfm.cash,rfm.frequency
    ORDER BY rfm.email
    

    Further reading here: Decode equivalent in postgres