I have a two variables called x and y (please see R code below the picture). When I plot(x, y)
, I obtain the top-row plot (see below). y
values are stacked over the top of each x
value.
I am wondering WHY when I sample from y
values that are separately stacked over the top of each x
value (e.g., y-values stacked over the top of x value of "0"), I get some sampled y
values that are outside their range of their mother sample!? (please see the bottom-row table to see this).
############# Input Values ###################
each.sub.pop.n = 150;
sub.pop.means = 20:10;
predict.range = 0:10;
sub.pop.sd = .75;
n.sample = 2;
#############################################
par( mar = c(2, 4.1, 2.1, 2.1) )
m = matrix( c(1, 2), nrow = 2, ncol = 1 ); layout(m)
Vec.rnorm <- Vectorize(function(n, mean, sd) rnorm(n, mean, sd), 'mean')
y <- c( Vec.rnorm(each.sub.pop.n, sub.pop.means, sub.pop.sd) )
x <- rep(predict.range, each = each.sub.pop.n)
plot(x, y)
## Unsuccessfull Sampling ##
x <- rep(predict.range, each = n.sample)
y <- sample(y , length(x), replace = TRUE)
plot(x, y)
It seems to me that your sample is not conditional on x in your unsuccessful sampling piece. In the below, I split the y data by x and then sampled two cases from each. The result seems to work.
sample <- lapply(split(y, x), function(z) sample(z, n.sample, replace = TRUE))
sample <- data.frame(y = unlist(sample),
x = as.numeric(rep(names(sample), each = n.sample)))
plot(sample$x, sample$y)