I have two variables called x
and y
(please see R code below the picture). When I plot(x, y)
, I obtain the top-row plot (see below). y values are stacked over the top of each x value. I then try to sample from these y
values and make a second plot below the mother plot.
I am wondering WHY when I use the predit.range
(see R code below) to be 10:0
(the problem doesn't happen when I use 0:10
) my sampling procedures goes completely in the wrong direction? (please compare the top-row plot to the bottom-row plot)
############# Input Values ################
each.sub.pop.n = 150;
sub.pop.means = 20:10;
predict.range = 10:0;
sub.pop.sd = .75;
n.sample = 2;
#############################################
par( mar = c(2, 4.1, 2.1, 2.1) )
m = matrix( c(1, 2), nrow = 2, ncol = 1 ); layout(m)
Vec.rnorm <- Vectorize(function(n, mean, sd) rnorm(n, mean, sd), 'mean')
y <- c( Vec.rnorm(each.sub.pop.n, sub.pop.means, sub.pop.sd) )
x <- rep(predict.range, each = each.sub.pop.n)
plot(x, y)
## Unsuccessful Sampling ## The problem must be lying in here:
sampled <- lapply(split(y, x), function(z) sample(z, n.sample, replace = TRUE))
sampled <- data.frame(y = unlist(sampled),
x = rep(predict.range, each = n.sample))
plot(sampled$x, sampled$y)
This is sufficient to illustrate why.
x <- 10:0; y <- 10:0
Did you notice how
split(y, x)
sorts the list? To get your desired ordering, control factor levels:
split(y, factor(x, levels = unique(x))
In your context, you can use efficiently without unique
:
split(y, factor(x, levels = predict.range))