I'm trying to generate a dataset generated from 3 distributions and then plot it with the groups colored differently. The outcome is distributed with mean that is a linear regression of time and the errors follow a standard normal distribution. I want it to have 100 trajectories with 5 time points and the following characteristics.
Dist. 1: intercept term of 0.1, slope of 2 Dist. 2: intercept term of 2, slope of 5 Dist. 3: intercept term of 3, slope of 7.
I want to generate the mixture of 100 trajectories assuming an average of 60% of data generated from dist 1, 25% generated from dist. 2, and and 15% generated from dist 3.
The problem I'm having is I don't know how to change the "h" variable to allow me to have three mixtures.
This is what I've done so far.
set.seed(10)
x <- c(1:5); #time points
mu.0 <- c(0.1, 2, 3) #intercept for each dist.
mu <- c(2, 5, 7) #slope for each dist.
n.subj <- 100
h <- 1+rbinom(100, 1, 0.6)
alpha <- rnorm(100,0,1)
y.data <- c(); ind <- c()
for(i in 1:100){
ind <- rbind(ind, rep(i, 5))
y.data <- rbind(y.data,
c( mu.0[h[i]]+mu[h[i]]*x+alpha[i]+rnorm(5,0,1)))}
Thanks in advance!
If you want h
to be a vector of indices, this could be done with the sample.int
function like so
h <- sample.int(3, 100, replace=TRUE, prob=c(0.6, 0.25, 0.15))
h
## [1] 1 3 1 2 1 1 1 1 2 2 1 1 1 3 1 3 1 2 1 2 1 1 1 1 1 3 1 1 1 1 1 1
## [33] 2 1 1 1 1 2 1 1 1 1 2 2 1 1 2 2 1 1 1 1 1 3 1 2 2 1 2 1 1 3 3 1
## [65] 2 3 2 3 1 1 1 1 3 3 2 1 2 2 1 1 2 3 2 2 3 3 1 1 1 1 1 1 1 3 1 2
## [97] 1 2 3 1