Search code examples
rforecasting

Sum absolute difference in dataframe


I have a small dataframe with values for 10 periods. I want to sum the absolute difference (absolute error) between each value and the predicted value.

Column labels: P1, P2, P3, .....P10

Values: 3, 4, 3 ......7 (see data below)

Predicted value = 5 (it is not always 5)

"error" formula = |3-5|+|4-5|+|3-5|+....+|7-5|

> data
   cust P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 predict error
1     A  1  1  1  1  1  1  1  1  1   1       5     ?
2     B  3  3  3  3  3  3  3  3  3   3       5     ?
3     C  1  1  1  1  1  3  3  3  3   3       5     ?
4     D  1  0  1  0  1  0  1  0  1   0       5     ?
5     E  1  0  0  1  0  0  1  0  0   1       5     ?
6     F  1  3  1  3  1  3  1  3  1   3       5     ?
7     G  5  5  5  5  5  5  5  5  5   5       5     ?
8     H  8  8  8  8  8  8  8  8  8   8       5     ?
9     I  5  5  5  5  5  8  8  8  8   8       5     ?
10    J  5  0  5  0  5  0  5  0  5   0       5     ?
11    K  5  0  0  5  0  0  5  0  0   5       5     ?
12    L  5  8  5  8  5  8  5  8  5   8       5     ?

I can do the calculations in a long format, but I don't want to redo long messy formulas for different size data. The eventual dataset will have many more periods and customers, so I need a formula / function that will work for different sizes of data frames. I would appreciate some help.

I know that this can be done using forecasting packages, but I need to build it from the bottom so that I can do other things with the results.


Solution

  • I think that a lot of the reason for your aversion to answers involving melting long are because of code like the other two answers that have appeared here. They do the job -- but are practically unreadable.

    Using dplyr & tidyr, produces general as well as readable code:

    library(dplyr)
    library(tidyr)
    library(ggplot2)
    
    # read data in
    dfX = as_data_frame(read.table(textConnection("
                    cust P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 predict error
    1     A  1  1  1  1  1  1  1  1  1   1       5     ?
                   2     B  3  3  3  3  3  3  3  3  3   3       5     ?
                   3     C  1  1  1  1  1  3  3  3  3   3       5     ?
                   4     D  1  0  1  0  1  0  1  0  1   0       5     ?
                   5     E  1  0  0  1  0  0  1  0  0   1       5     ?
                   6     F  1  3  1  3  1  3  1  3  1   3       5     ?
                   7     G  5  5  5  5  5  5  5  5  5   5       5     ?
                   8     H  8  8  8  8  8  8  8  8  8   8       5     ?
                   9     I  5  5  5  5  5  8  8  8  8   8       5     ?
                   10    J  5  0  5  0  5  0  5  0  5   0       5     ?
                   11    K  5  0  0  5  0  0  5  0  0   5       5     ?
                   12    L  5  8  5  8  5  8  5  8  5   8       5     ?"),
                     header = TRUE, stringsAsFactors = FALSE))
    
    # melt & compute error
    dfXErr = dfX %>%
      select(-error) %>%                    
      gather(period, actual, -cust, -predict) %>%
      group_by(cust) %>%
      summarize(mape = mean(abs(actual - predict)))
    
    # join back to original data (if required)
    inner_join(dfX, dfXErr, by = "cust")