I'm trying to make the correlation matrix
Here a sample of the dataset.
> head(matrix)
# A tibble: 6 x 16
# Groups: nquest, nord [6]
nquest nord sex anasc ireg eta staciv studio asnonoc2 nace2 nesplav etalav dislav acontrib occnow tpens
<int> <int> <dbl> <int> <int> <int> <int> <fct> <int> <int> <fct> <fct> <fct> <int> <int> <int>
1 173 1 1 1948 18 72 3 2 2 19 1 2 0 35 2 1800
2 2886 1 1 1949 13 71 1 2 2 16 1 2 0 35 2 1211
3 2886 2 0 1952 13 68 1 3 2 17 1 2 0 42 2 2100
4 5416 1 0 1958 8 62 3 1 1 19 2 1 0 30 2 700
5 7886 1 1 1950 9 70 1 2 2 11 1 2 0 35 2 2000
6 20297 1 1 1960 5 60 1 1 1 19 2 1 0 39 2 1200
Actually, nquest
and nord
are identification codes: the first is for the family, the second for the member of that specific family. Even if I try to remove them (because I think they are useless in a correlation matrix), dplyr add them automatically
matrix <- final %>%
select("sex", "anasc", "ireg", "eta","staciv", "studio", "asnonoc2",
"nace2", "nesplav", "etalav", "dislav", "acontrib", "occnow",
"tpens")
Dplyr answers
Adding missing grouping variables: `nquest`, `nord`
However, I don't think it is a problem if they remain in the dataset.
My goal is to compute the correlation matrix, but this dataset seems to have some NA values
> sum(is.na(matrix))
[1] 109
I've tried these codes, but none of them works.
The first
cor(matrix, use = "pairwise.complete.obs")
R replies
Error in cor(matrix, use = "pairwise.complete.obs") :
'x' must be numeric
The second
cor(na.omit(matrix))
R answers
Error in cor(na.omit(matrix)) : 'x' must be numeric
I've also tried
matrix <- as.numeric(matrix)
But I get another kind of error
Error: 'list' object cannot be coerced to type 'double'
How can I solve? What am I doing wrong?
The problem might be in the type of your data columns. In your example some of your data columns are of type factor
(indicated as <fct>
in your data like studio
for example). They are actually numeric
but currently of factor
type in your dataset for some unknown reasons. Therefore they are recognized by the cor()
function as string type and not numeric resulted in throwing the error. So, you might need to convert your data type into their numeric format for correlation analysis. One option is to use type.convert()
. If you have columns of type character
(like string values) they must be removed from the data to be used for correlation analysis. Also as was suggested by commenters would be better to not use reserved names in R for your objects like matrix
in your example. Here is my advise:
# copy your data into dft
dft<-matrix
#return the type of variables into their actual type
dft <- type.convert(dft,as.is=TRUE)
# perform correlation excluding first two columns as you explained not informative
cor(dft[,-c(1:2)],use = "pairwise.complete.obs")
Hope it could helps