Search code examples
rcorrelation

correlation of one variable to all the other in R


I want to calculate the correlation between my dependent variable y and all my x. I use the code below,

   cor(loan_data_10v[sapply(loan_data_10v, is.numeric)],use="complete.obs")

the result is a correlation matrix. How can i just get one column with my variable y.


Solution

  • If we are looking for cor between 'x' and 'y', both argument can be either a vector or matrix. using a reproducible example, say mtcars and suppose 'y' is 'mpg' and 'x' the other variables ('mpg' is the first column, so we used mtcars[-1] for 'x')

    cor(mtcars[-1], mtcars$mpg) 
    #          [,1]
    #cyl  -0.8521620
    #disp -0.8475514
    #hp   -0.7761684
    #drat  0.6811719
    #wt   -0.8676594
    #qsec  0.4186840
    #vs    0.6640389
    #am    0.5998324
    #gear  0.4802848
    #carb -0.5509251
    

    If we have numeric/non-numeric columns, create an index of numeric columns ('i1'), get the names of 'x' and 'y' variables using this index and apply the cor

    i1 <- sapply(loan_data_10v, is.numeric)
    y1 <- "dep_column" #change it to actual column name
    x1 <- setdiff(names(loan_data_10v)[i1], y1)
    cor(loan_data_10v[x1], loan_data_10v[[y1]])