Search code examples
rpredictionquantile-regression

Compute interval score for interval predictions in R


At the thread of stackexchange: "forecast-accuracy-metric-that-involves-prediction-intervals" for more details see the link where a quality measure for prediction interval is shown.

I would like to compute quality meassure in R:

library(quantreg)

## Split data 
smp_size <- floor(0.75 * nrow(iris))
set.seed(123)

train_ind <- sample(seq_len(nrow(iris)), size = smp_size)
train <- iris[train_ind, ]
test <- iris[-train_ind, ]

# Training model for prediction intervals, lw(lower) and up(upper) intervals
model_lw <- rq(Sepal.Length~Petal.Length+Petal.Width, data= train, tau = 0.1)
model_up <- rq(Sepal.Length~Petal.Length+Petal.Width, data= train, tau = 0.9)

# Interval Predictions, lw(lower) and up(upper) intervals
pred_lw <- predict(model_lw, test)
pred_up <- predict(model_up, test) 

By using the products:

pred_lw,pred_up & test$Sepal.Length

Goal

  • An interval quality meassure could be computed. I would like to find an implementation library for interval perdiction evaluation.

  • An alternative solution could be computing the "coverage and length of the prediction intervals" or any other evaluation metric.

Any help on this implementation?


Solution

  • For evaluation prediction intervals for quantle regression, two implementation solutions are found, with any other metrics included scoringutils and greybox.

    Solution

    library(scoringutils)
    # Scoring Rule to score quantile predictions, (Gneiting & Raftery, 2007)
    mean(interval_score(true_values = test$Sepal.Length, 
                   lower = pred_lw, 
                   upper = pred_up, 
                   interval_range = 80))
    
    library(greybox)
    # Mean Interval Score (Gneiting & Raftery, 2007),
    MIS(actual = test$Sepal.Length, 
        lower = pred_lw, 
        upper = pred_up, 
        level = 0.80)
    
    # interval range or level -> 0.9 - 0.1 = 0.8 (80) 
    

    In the second package a symetric and relative score measure are avaliable, further study should be done in order to undertand the bias and aplications of this metrics with some other statistics.

    hope this helps to the community