Given an ordered vector vec <- c(1, 4, 6, 3, 2, 7)
, I want to compute for each element i
of vec
the weighted average of the previous elements where the weight is the inverse of the distance from the element i
.
The function should proceed as following.
1
, should return NA
(no previous element).4
, should return 1
. 6
, should return weighted.mean(x = c(1,4), w
= c(1,2))
.3
, should return weighted.mean(x =
c(1,4,6), w = c(1,2,3))
The resulting vector result
should be, with length(result) == length(vec)
, c(NA, 1, 3, 4.5, 3.9, 3.266667)
.
UPDATE: I clearly mean without using a loop
result <- numeric()
for (i in 1:length(vec)) {
if (i == 1) {
result <-
c(result, NA)
} else {
previous_elements <- vec[1:(i-1)]
result <-
c(result,
weighted.mean(x = previous_elements, w = 1:length(previous_elements)))
}
}
Here's a naive implementation. Create a function that does what you say; the only 'clever' thing is to use the function seq_len()
instead of 1:i to generate the indexes
fun = function(i, vec)
weighted.mean(head(vec, i - 1), w=seq_len(i - 1))
and then use it in sapply
sapply(seq_along(vec), fun, vec)
This is good enough -- NaN as the first element, rather than NA, but that's easily corrected after the fact (or conceptually accepted as the right answer). It's also better than your solution, but still 'using a loop' -- the management of the result vector is done by sapply()
, rather than in your loop where you have to manage it yourself. And in particular your 'copy and append' approach is very bad performance-wise, making a copy of the existing result each time through the loop. It's better to pre-allocate a result vector of the appropriate length result = numeric(length(vec))
and then fill it result[[i]] = ...
, and better still to just let sapply()
do the right thing for you!
The problem is that the naive implementation scales quadratically -- you make a pass along vec
to process each element, and then for each element you make a second pass to calculate the weighted mean, so there are n (n - 1) / 2
calculations. So...
Take a look at weighted.mean
> stats:::weighted.mean.default
function (x, w, ..., na.rm = FALSE)
{
## SNIP -- edited for brevity
w <- as.double(w)
if (na.rm) {
i <- !is.na(x)
w <- w[i]
x <- x[i]
}
sum((x * w)[w != 0])/sum(w)
}
and use cumsum()
instead of sum()
to get the cumulative weights, rather than the individual weights, i.e., return a vector as long as x
, where the ith element is the weighted mean up to that point
cumweighted.mean <- function(x, w) {
## handle NA values?
w <- as.numeric(w) # to avoid integer overflow
cumsum(x * w)[w != 0] / cumsum(w)
}
You'd like something a little different
myweighted.mean <- function(x)
c(NA, cumweighted.mean(head(x, -1), head(seq_along(x), - 1)))
This makes a single pass through the data, so scales linearly (at least in theory).