I'm working with a dataframe called creditcard in R and I wanted to compute correlation of the variable Debt from that same dataframe. However, and I don't know why, it gives me an error message saying Debt has to be numeric:
Error in cor(Debt,Limit) : 'x' must be numeric
I tried converting it to a numeric variable with this code:
Debt=as.numeric(as.character(Debt))
It still doesn´t work. It becomes numeric, but has lost most of its previous 400 observations, which were reduced to just 13...
> sapply(creditcard,class)
ID Income Limit
"integer" "numeric" "integer"
Rating Cards Age
"integer" "integer" "integer"
Education Gender Student
"integer" "factor" "factor"
Married Ethnicity Debt
"factor" "factor" "integer"
Sample data:
> dput(head(Debt))
c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)
I've been working with the creditcard dataframe for 3 months now, and haven't had this problem until recently, as Debt mysteriously started behaving like a data.frame object instead of numeric. Any ideas how I might get my old numeric Debt back with all 400 observations? Thanks in advance.
The cor
function can take different inputs. If you provide it a matrix or data.frame as single argument, it will give you a correlation matrix for all variables, however all variables then have to be numeric.
To get all correlations of numerical variables of the creditcard
data.frame you could do cor(creditcard[,sapply(credicard, is.numeric)])
.
Otherwise you can get the correlation between single columns of your data.frame by giving it two arguments, either as cor(creditcard$Debt, creditcard$Limit)
or as with(creditcard, cor(Debt, Limit))
.
The errors you are currently getting come from the fact that the variables you are calling are either inaccessible or the wrong type. If you call cor(A, B)
, both A and B have to be in your current environment. If A and B are columns of a data.frame, they are not directly accessible, so you can either expose the columns of the data.frame using with(creditcard, ...)
as shown above, or directly access the columns within the data.frame (creditcard$A
or creditcard[,"A"]
or creditcard[["A"]]
).