Search code examples
firebasebayesianab-testing

Firebase Revenue AB testing algorithm


We have run an AB test at firebase which has the following results:

Firebase AB-Test

I was also building my own Bayesian AB-test suite and was wondering how they came to these conclusions.

What I was doing was querying the data of this test for the Control Group and Variant C:

  • Control Group: $11943 Revenue from 900 payers of 80491 users.
  • Variant C: $16487 Revenue from 894 payers of 80224 users.

I based my algorithm on this tool: https://vidogreg.shinyapps.io/bayes-arpu-test/. When I enter these inputs I get the following result:

Bayes ARPU tool results

This tool seems to be much more condident that Variant C is better than the control group then Firebase. It also seems like the Firebase distributions for Revenue per user are skewed while the Bayesian ARPU tool has very symmetrical distribution.

The code for the Bayesian ARPU tool is available. They used conjugate priors to get to these conclusions based on this paper:

https://cdn2.hubspot.net/hubfs/310840/VWO_SmartStats_technical_whitepaper.pdf

Can anyone help me out which results are the best?


Solution

  • I have found out what my problem was.

    The first problem is that it has to be broken into two steps. As it is freemium app, most user do not pay. This means that these users do not give extra information for the distribution.

    So, We first need to find posterior distribitions for the payer percentage. This can be done as explained the paper I mentioned. In Python a function for the posterior distribition is this:

    def binomial_rvar(successs, samples):
        rvar = np.random.beta(1 + successes, 1 + (total - successes), samples)
        return rvar
    

    Secondly, of all payers, we want to get the revenue. The paper also describes how to do revenue, but they assume the revenue is exponentially distributed. This is not the case for our app. We have some users that spend insane amount of money on this app. If this user were to be in one of the groups, this method will immediately think it is the best.

    What we can do is take the log of the pareto distributed samples, which will transform a pareto distbution into a exponential distribution. We first take the log of the user revenue and then sum all these together creating the "logsum" and count from how many users it came. We can then use the same approach as the paper uses. In Python this would be something like this:

    def get_exponential_rvars(total_sum, users, samples):
        r_var = 1. / np.random.gamma(users, 1 / (1 + total_sum), samples)
        return r_var
    

    We can now multiply both these r_var results, giving the final distribution for the revenue per user.