Lift curve is swapped

For the example for the lift curve I run

library(caret)
set.seed(1)
simulated <- data.frame(obs = factor(rep(letters[1:2], each = 100)),
                        perfect = sort(runif(200), decreasing = TRUE),
                        random = runif(200))
lift2 <- lift(obs ~ random + perfect, data = simulated)
xyplot(lift2, plot = "lift", auto.key = list(columns = 2))

and get

as result. I expected the image to be swapped horizontally - something along the lines of

What am I doing wrong?

Btw: This is a lift chart not a cumulative gains chart.

Update:

The plot that I expected, produced now by my own code

mylift <- caret::lift(Class ~ cforest_prob + perfect_prob + guess_prob, data = data_test)
ggplot(mylift$data) +
  geom_line(aes(CumTestedPct, lift, color = liftModelVar))

I noticed, that the data.frame mylift$data contains the following columns:

names(mylift$data)
 [1] "liftModelVar" "cuts"         "events"       "n"            "Sn"           "Sp"           "EventPct"    
 [8] "CumEventPct"  "lift"         "CumTestedPct"

So I printed the following plot

ggplot(mylift$data) +
  geom_line(aes(cuts, lift, color = liftModelVar))

So I guess that the different plots are just different ways of examining lift? I wasn't aware that there are different lift charts - I thought it was standardized across the industry.

Solution

Edit by the question author, for late readers: I accepted this answer for a large part because of the helpful discussion in the comments to this answer. Please consider reading the discussion!

Let's reproduce the graph and find the baseline. Let

cutoffs <- seq(0, 1, length = 1000)

be our cutoffs. Now the main computations are done by

aux <- sapply(cutoffs, function(ct) {
  perf <- simulated$obs[simulated$perfect > ct]
  rand <- simulated$obs[simulated$random > ct]
  c(mean(perf == "a"), mean(rand == "a"))
})

where we go over the vector of cutoffs and do the following. Take the perfect case. We say that whenever perfect > ct, we are going to predict "a". Then simulated$obs[simulated$perfect > ct] are the true values, while mean(perf == "a") is our accuracy with a given cutoff. The same happens with random.

As for the baseline, it is just a constant defined by the share of "a" in the sample:

baseline <- mean(simulated$obs == "a")

When plotting the lifts, we divide our accuracy by that of the baseline method and get the same graph along with the baseline curve:

plot(x = cutoffs, y = aux[1, ] / baseline, type = 'l', ylim = c(0, 2), xlab = "Cutoff", ylab = "Lift")
lines(x = cutoffs, y = aux[2, ] / baseline, col = 'blue')
abline(a = baseline / baseline, b = 0, col = 'magenta')

Update:

Here's an illustration that, at least when plotted manually, the lift curve of the "expected" type can be manipulated and gives non-unique results.

Your example graph is from here, which also has this data:

#   contacted response
# 1      10000     6000
# 2      20000    10000
# 3      30000    13000
# 4      40000    15800
# 5      50000    17000
# 6      60000    18000
# 7      70000    18800
# 8      80000    19400
# 9      90000    19800
# 10    100000    20000

Now suppose that we know not this evolution but 10 individual blocks:

#    contacted response
# 1      10000     6000
# 2      10000     4000
# 3      10000     3000
# 4      10000     2800
# 5      10000     1200
# 6      10000     1000
# 7      10000      800
# 8      10000      600
# 9      10000      400
# 10     10000      200

In that case it depends on how we order the observations when putting "% Contacted" in the x-axis:

set.seed(1)
baseline <- sum(df$response) / sum(df$contacted) * cumsum(df$contacted)
lift1 <- cumsum(df$response)
lift2 <- cumsum(sample(df$response))
x <- 1:10 * 10

plot(x = x, y = lift1 / baseline, col = 'red', type = 'l', ylim = c(0, 3), xlab = "% Customers contacted", ylab = "Lift")
lines(x = x, y = lift2 / baseline, col = 'blue')
abline(a = baseline / baseline, b = 0, col = 'magenta')