Search code examples
rtime-seriescurve-fittingforecastingauc

Using the integrate function to find area under a continuous probability curve -finding % completion of a marketing campaign?


I'm trying to build a model to forecast direct mail marketing campaign responses. In the code below I was able to use responses from a previous campaign to create a smooth curve (i.e. continuous probability). Now, I need to find the total area under this curve for each day so I know what percent complete the campaign is on a given day. Theoretically using the integrate function then finding the difference between areas using the diff function should work. For example, I will be able to find the area under the curve after day 2 and subtract the area under the curve after day 1. Knowing the addional area under the curve for each day will help me tell the % completion with each day. The problem is that I can't figure out a way to integrate this 64 day curve so that the total density sums to 1.

#vector of direct mail marketing responses over 63 days  
responses <- c(24.16093706,
41.59607507,
68.20083052,
85.19109064,
100.0704403,
58.6600221,
86.08475816,
88.97439581,
65.58341418,
49.25588053,
53.63602085,
47.03620672,
29.71552264,
32.85862747,
31.29118096,
23.67961069,
19.81261675,
18.69300933,
17.25738435,
12.01161679,
12.36734071,
14.32360673,
11.02390849,
9.108021409,
9.647965622,
8.815576548,
5.67225654,
5.739220185,
6.233999138,
5.527376627,
5.024065761,
5.565266355,
4.626749364,
3.480761716,
4.621902301,
4.518554271,
4.075985188,
3.204946787,
3.174020873,
2.966915873,
2.129178828,
2.673009031,
2.410429043,
2.331287075,
2.509300578,
2.13820695,
2.53433787,
1.603934405,
1.555813592,
1.834605068,
1.842905685,
1.454045577,
2.08684322,
1.318276487,
0.807666643,
1.333167088,
1.004526525,
1.180110123,
1.078079735,
1.151394678,
1.426747942,
0.699119833,
0.583347236)


set.seed(2) 
## install.packages("MASS") 
library("MASS")

shape_and_scale <- fitdistr(responses,'weibull')

shape_and_scale

#now use the curve() function, dweibull, and the shape and scale parameters to create a smooth curve 
curve results <- curve(dweibull(x,0.70730466,13.79467490),from=0, to=63)

Now I need a way to integrate this curve to find the area under the curve after day 1, day 2, day 3, ...etc. After that I should be able to use diff to find the difference between day 2 and day 1, etc. which I can use to find the % of campaign completion after each day. In my code above I truncated the curve from 0 to 63. Is there a way to use this? For example, if I just do: diff(pweibull(0:63,0.70730466,13.79467490)) I do not leverage the fact that I already truncated the curve from 0 to 63 so the density doesn't add to 1.

For example:

sum(diff(pweibull(0:63,0.70730466,13.79467490))) equals .94, which is the same thing as: integrate(dweibull, 0, 63, shape = 0.70730466,scale = 13.79467490)

...but these don't leverage the fact that in the first block of code I've already truncated the curve to 63 days. I just like to integrate that so that the sum of the area under the curve is 1?

Thanks


Solution

  • I think you're looking for

    d0 <- diff(pweibull(0:63,0.70730466,13.79467490))
    

    If you just want to normalize this to sum to 1, then divide it either by pweibull(63,...) or by sum(d0) (which are the same).

    If you want a final category that includes everything beyond 63, use

    d1 <- diff(pweibull(c(0:63,Inf),0.70730466,13.79467490))
    sum(d1)   ## 1
    

    The latter is equivalent to c(d0,1-sum(d0)).