Search code examples
rregressionpredictiondata-fitting

How can I find the date for which a model from family = poisson(link = "log") predicts a specified value?


I have data that consists of values on dates, here represented as timestamps:

json <- structure(list(creation_date = c(1689356840L, 1689366017L, 1689376446L, 
1689504809L, 1690366268L, 1690472012L, 1690478723L, 1690701972L, 
1690701973L, 1690746389L, 1690746392L, 1690831235L, 1691022236L, 
1692097787L, 1692625463L, 1692699453L, 1692828527L, 1692828527L, 
1692828550L, 1692829734L, 1692878775L, 1692879861L, 1693058927L, 
1693082719L, 1693106370L, 1693115869L, 1693120632L, 1693121481L, 
1693137142L, 1693164782L, 1693172783L, 1693175185L, 1693208767L, 
1693909005L, 1693914013L, 1693921364L, 1693928549L, 1693930560L, 
1693999563L, 1694001298L, 1694003421L, 1694003758L, 1694005624L, 
1694015620L, 1694015743L, 1694074163L, 1694153769L, 1694181062L, 
1694194729L, 1694194740L, 1694194740L, 1694194744L, 1694209208L, 
1694594207L, 1694798845L, 1694798846L, 1694809497L, 1694883734L, 
1694901062L, 1695124304L, 1695221997L, 1695222830L, 1695259750L, 
1695259766L, 1695260229L, 1695283945L, 1695329985L, 1695330735L, 
1695363048L, 1695399875L, 1695403486L, 1695409526L, 1695458043L, 
1695464399L, 1695469820L, 1695472988L, 1695479007L, 1695483407L, 
1695485270L, 1695485763L, 1695485763L, 1695493324L, 1695505234L, 
1695505235L, 1695509011L, 1695517370L, 1695527643L, 1695556326L, 
1695567795L, 1695567798L, 1695575998L, 1695576929L, 1695589633L, 
1695596154L, 1695607375L, 1695633830L, 1695657647L, 1695658242L, 
1695710762L, 1695719101L), cumulative_reputation = c(100L, 110L, 
120L, 130L, 140L, 150L, 160L, 170L, 185L, 200L, 185L, 195L, 205L, 
215L, 213L, 228L, 230L, 240L, 230L, 240L, 250L, 252L, 262L, 272L, 
282L, 292L, 302L, 312L, 322L, 332L, 342L, 352L, 362L, 372L, 382L, 
392L, 402L, 412L, 422L, 432L, 442L, 452L, 462L, 472L, 482L, 492L, 
502L, 512L, 522L, 512L, 510L, 512L, 522L, 532L, 547L, 557L, 567L, 
577L, 587L, 597L, 607L, 617L, 632L, 617L, 627L, 637L, 647L, 657L, 
672L, 682L, 692L, 707L, 717L, 727L, 737L, 747L, 745L, 755L, 757L, 
747L, 745L, 755L, 753L, 755L, 765L, 775L, 785L, 795L, 805L, 795L, 
805L, 815L, 825L, 835L, 845L, 847L, 857L, 867L, 877L, 887L)), row.names = c(670L, 
669L, 668L, 667L, 666L, 665L, 664L, 663L, 662L, 661L, 660L, 659L, 
658L, 657L, 656L, 655L, 653L, 654L, 652L, 651L, 650L, 649L, 648L, 
647L, 646L, 645L, 644L, 643L, 642L, 641L, 640L, 639L, 638L, 637L, 
636L, 635L, 634L, 633L, 632L, 631L, 630L, 629L, 628L, 627L, 626L, 
625L, 624L, 623L, 622L, 620L, 621L, 619L, 618L, 617L, 616L, 615L, 
614L, 613L, 612L, 611L, 610L, 609L, 608L, 607L, 606L, 605L, 604L, 
603L, 602L, 601L, 600L, 599L, 598L, 597L, 596L, 595L, 594L, 593L, 
592L, 590L, 591L, 589L, 588L, 587L, 586L, 585L, 584L, 583L, 582L, 
581L, 580L, 579L, 578L, 577L, 576L, 575L, 574L, 573L, 572L, 571L
), class = "data.frame")

I have fit a model to that data and can now calculate the predicted values:

model <- glm(cumulative_reputation ~ creation_date, data = json, family = poisson(link = "log"))
prediction <- predict(model, json)

plot of the measured and the predicted values

How can I find the date (or timestamp) for which the model predicts a specific value?

To find that date, I have calculated values from the last date of measurement (1695719101) to a year into the future (1695719101 + 365 * 24 * 60 * 60) and then searched the resulting values for the one closest to the one I want (cumulative_reputation = 10000).

Is there a way to calculate this date (or timestamp) directly?


Note:

This question is a follow up to this question.


Solution

  • If you have the information in the format that @PBulls provided, i.e.

    COEF <- coef(model)
    pred <- function(t) exp(COEF[1] + t*COEF[2])
    

    Then you don't need uniroot() to get the answer, you can get the answer analytically:

    target = exp(c1 + c2*t)
    log(target) = c1 + c2*t
    t = (log(target) - c1)/c2
    

    or in R code,

    (log(target) - COEF[1])/COEF[2]
    

    The uniroot() approach is still handy for more complicated models (e.g. splines) where the math is hard or impossible.