Search code examples
rdataframenumeric

Can't convert data.frame into numeric in R


I'm working with a dataframe called creditcard in R and I wanted to compute correlation of the variable Debt from that same dataframe. However, and I don't know why, it gives me an error message saying Debt has to be numeric:

Error in cor(Debt,Limit) : 'x' must be numeric

I tried converting it to a numeric variable with this code:

Debt=as.numeric(as.character(Debt))

It still doesn´t work. It becomes numeric, but has lost most of its previous 400 observations, which were reduced to just 13...

 > sapply(creditcard,class)
       ID    Income     Limit 
"integer" "numeric" "integer" 
   Rating     Cards       Age 
"integer" "integer" "integer" 
Education    Gender   Student 
"integer"  "factor"  "factor" 
  Married Ethnicity      Debt 
 "factor"  "factor" "integer"

Sample data:

  > dput(head(Debt))
c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)

I've been working with the creditcard dataframe for 3 months now, and haven't had this problem until recently, as Debt mysteriously started behaving like a data.frame object instead of numeric. Any ideas how I might get my old numeric Debt back with all 400 observations? Thanks in advance.


Solution

  • The cor function can take different inputs. If you provide it a matrix or data.frame as single argument, it will give you a correlation matrix for all variables, however all variables then have to be numeric.

    To get all correlations of numerical variables of the creditcard data.frame you could do cor(creditcard[,sapply(credicard, is.numeric)]).

    Otherwise you can get the correlation between single columns of your data.frame by giving it two arguments, either as cor(creditcard$Debt, creditcard$Limit) or as with(creditcard, cor(Debt, Limit)).

    The errors you are currently getting come from the fact that the variables you are calling are either inaccessible or the wrong type. If you call cor(A, B), both A and B have to be in your current environment. If A and B are columns of a data.frame, they are not directly accessible, so you can either expose the columns of the data.frame using with(creditcard, ...) as shown above, or directly access the columns within the data.frame (creditcard$A or creditcard[,"A"] or creditcard[["A"]]).