I have a small dataframe with values for 10 periods. I want to sum the absolute difference (absolute error) between each value and the predicted value.
Column labels: P1, P2, P3, .....P10
Values: 3, 4, 3 ......7 (see data below)
Predicted value = 5 (it is not always 5)
"error" formula = |3-5|+|4-5|+|3-5|+....+|7-5|
> data
cust P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 predict error
1 A 1 1 1 1 1 1 1 1 1 1 5 ?
2 B 3 3 3 3 3 3 3 3 3 3 5 ?
3 C 1 1 1 1 1 3 3 3 3 3 5 ?
4 D 1 0 1 0 1 0 1 0 1 0 5 ?
5 E 1 0 0 1 0 0 1 0 0 1 5 ?
6 F 1 3 1 3 1 3 1 3 1 3 5 ?
7 G 5 5 5 5 5 5 5 5 5 5 5 ?
8 H 8 8 8 8 8 8 8 8 8 8 5 ?
9 I 5 5 5 5 5 8 8 8 8 8 5 ?
10 J 5 0 5 0 5 0 5 0 5 0 5 ?
11 K 5 0 0 5 0 0 5 0 0 5 5 ?
12 L 5 8 5 8 5 8 5 8 5 8 5 ?
I can do the calculations in a long format, but I don't want to redo long messy formulas for different size data. The eventual dataset will have many more periods and customers, so I need a formula / function that will work for different sizes of data frames. I would appreciate some help.
I know that this can be done using forecasting packages, but I need to build it from the bottom so that I can do other things with the results.
I think that a lot of the reason for your aversion to answers involving melt
ing long are because of code like the other two answers that have appeared here. They do the job -- but are practically unreadable.
Using dplyr
& tidyr
, produces general as well as readable code:
library(dplyr)
library(tidyr)
library(ggplot2)
# read data in
dfX = as_data_frame(read.table(textConnection("
cust P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 predict error
1 A 1 1 1 1 1 1 1 1 1 1 5 ?
2 B 3 3 3 3 3 3 3 3 3 3 5 ?
3 C 1 1 1 1 1 3 3 3 3 3 5 ?
4 D 1 0 1 0 1 0 1 0 1 0 5 ?
5 E 1 0 0 1 0 0 1 0 0 1 5 ?
6 F 1 3 1 3 1 3 1 3 1 3 5 ?
7 G 5 5 5 5 5 5 5 5 5 5 5 ?
8 H 8 8 8 8 8 8 8 8 8 8 5 ?
9 I 5 5 5 5 5 8 8 8 8 8 5 ?
10 J 5 0 5 0 5 0 5 0 5 0 5 ?
11 K 5 0 0 5 0 0 5 0 0 5 5 ?
12 L 5 8 5 8 5 8 5 8 5 8 5 ?"),
header = TRUE, stringsAsFactors = FALSE))
# melt & compute error
dfXErr = dfX %>%
select(-error) %>%
gather(period, actual, -cust, -predict) %>%
group_by(cust) %>%
summarize(mape = mean(abs(actual - predict)))
# join back to original data (if required)
inner_join(dfX, dfXErr, by = "cust")