Weighting Brand Entropy using Frequency of Purchases

I have a list of purchases for every customer and I am trying to determine brand loyalty. Based on this list I have calculated each customer's brand entropy which I am using as a proxy for brand loyalty. For example, if a customer only purchase brand_a then then their entropy will be 0 and they are very brand loyal. However, if the customer purchases brand_a, brand_b and others then their entropy will be high and they are not very brand loyal.

# Dummy Data
CUST_ID <- c("c_X","c_X","c_X","c_Y","c_Y","c_Z")
BRAND <- c("brand_a","brand_a","brand_a","brand_a","brand_b","brand_a")
PURCHASES <- data.frame(CUST_ID,BRAND)

# Casting from PURCHASES to grouped_by CUST_ID
library(plyr)
library(dplyr)
library(data.table)
ENTROPY <- PURCHASES %>%
  group_by(CUST_ID, BRAND) %>%
  summarise(count = n()) %>%
  dcast(CUST_ID ~ BRAND, value.var = "count")
ENTROPY[is.na(ENTROPY)] <- 0

# Calculating Entropy
library(entropy)
ENTROPY$entropy <- NA
for (i in 1:nrow(ENTROPY)){
  ENTROPY[i,4] <- entropy(as.numeric(as.vector(ENTROPY[i,2:3])), method="ML")
}

# Calculating Frequency
ENTROPY$frequency <- ENTROPY$brand_a + ENTROPY$brand_b
ENTROPY

However, my problem is that entropy does not account for the quantity of purchases of each customer. Consider the following cases:

1) Customer_X has made 3 purchases, each time it is brand_a. Their entropy is 0.

2) Customer_Z has made 1 purchase, it is brand_a. Their entropy is 0.

Naturally, we are more sure that Customer_X is more brand loyal then Customer_Z. Therefore, I would like to weight the entropy calculations by the frequency. However, Customer_X: 0/3 = 0 and Customer_Z: 0/1 = 0.

Essentially, I want a clever way to have Customer_X to have a low value for my brand loyalty and Customer_Z to have a higher value. One thought was to use a CART/Decision Tree/Random Forest Model, but if it can be done using clever math, that would be ideal.

Solution

I think the index that you want is entropy normalised by some expectation for the entropy given the number of purchases. Essentially, fit a curve to the graph of entropy vs number of purchases, and then divide each entropy by the expectation given by the curve.

Now this doesn't solve your problem with super-loyal customers which have 0 entropy. But I think the question there is subtly different: Is the apparent loyalty due to chance (low count) or is it real? This is a distinct question to how loyal is that customer. Essentially, you want to know the probability of a observing such a data point.

You could compute the probability of only having bought a single brand given the number of purchases from your data, if the 0 entropy events are your only pain point.

Alternatively, you could determine the full joint probability distribution for entropy and number of purchases (instead of just the mean), e.g. by density estimation, and then compute the conditional probability observing a given entropy given the number of purchases.