Search code examples
rsampling

Creating and Visualizing samples in R


I want to create 100 samples from a normal distribution. For the first class, the mean is to be taken as (0,0) and covariance matrix as [(1,0),(0,1)]. For the second class, the mean is to be taken as (5,0) but the covariance matrix is the same as for the first class and finally would like to visualize all 200 instances in a single plot with different colors for each class.

My problem is: When I generate this plot I am unsure about the final plot whether it actually has a volume of 200 samples.

My approach:

a1 <- c(1,0)
a2 <- c(0,1)

M <- cbind(a1, a2)
x <- cov(M)
dev <- sd(x, na.rm = FALSE)

C0 <- sample(rnorm(100, mean=0, sd=dev), size=100, replace=T)
C1 <-  sample(rnorm(100, mean=5, sd=dev), size=100, replace=T)

plot(C0,C1, col=c("red","blue"), main = '200 samples, with mean 0 and 5 and S.D=0.5')
legend("topright", 95, legend=c("C0", "C1"),
       col=c("red", "blue"), lty=1:2, cex=0.8)

I would like to know the corrections in my code.

plot


Solution

  • Aside from the plotting issue mentioned in the other answer, it seems from your description like you want to sample from two 2D multivariate normal distributions with different means.

    If so, you can simply use the mvtnorm library to sample from these distributions, which is the multivariate normal distribution.

    library(mvtnorm)
    C0 <- rmvnorm(100, c(0,0), M) # 100 samples, means (0, 0), covariance mtx M
    C1 <- rmvnorm(100, c(5,0), M)
    

    Right now, you take the covariance of the covariance matrix you have by typing x <- cov(M). This doesn't make much sense unless I'm misunderstanding what you're trying to accomplish.


    EDIT: This is the full code for what I think you're trying to accomplish:

    a1 <- c(1, 0)
    a2 <- c(0, 1)
    M <- cbind(a1, a2)
    
    C0 <- rmvnorm(100, c(0, 0), M)
    C1 <- rmvnorm(100, c(5, 0), M)
    
    plot(C0, col = "red", xlim = c(-5, 10), ylim = c(-5, 5), xlab = "X", ylab = "Y")
    points(C1, col = "blue")
    legend("topright", inset = .05, c("Class 1", "Class 2"), fill = c("red", "blue"))
    

    which outputs the plot

    class plot