Search code examples
rlogistic-regressionsample

Creating a synthetic dataset for logisitc regression model in R


I am trying to create a (synthetic) dataset for randomly created points for 200 samples. The issue is I am getting duplicate column names but my requirement is that I just want only one target column y

Here is my approach:

#For samples
library(mvtnorm)
library(fontawesome)
a1 <- c(1, 0)
a2 <- c(0, 1)
M <- cbind(a1, a2)

C0 <- rmvnorm(100, c(0, 0), M)
C1 <- rmvnorm(100, c(5, 0), M)

#Creating synthetic dataset
dat <- rbind(C0, C1)
y <- sign(-1 - 2 * x1 + 4 * x2 )
y[y == -1] <- 0
df1 <- cbind.data.frame(y, C)
df1

Would like to know what is wrong in my process

output of df1 df1 output


Solution

  • If 'y' needs to be created from 'dat'

     y <- sign(-1 - 2 * dat[,1] + 4 * dat[,2] )
    

    Now, the 'df' would be

    head(df1)
    #   y         X1         X2
    #1 0 -0.7846368  0.2959261
    #2 0  1.6764476  0.8565073
    #3 0 -0.9609016 -0.2585588
    #4 0  0.5455316  0.2600099
    #5 1 -1.5251354  0.2887918
    #6 0 -0.1563197  0.2524742