I am quite new to using r and am struggling to find a few to actually find a pearson correlation coeffcient from a set data. I am attempting to analyze whether there is a correlation between scores received for an assignment and the topic area chosen (Algebra, Calculus, Geometry, etc.) This is my dataframe
sc.ar <- structure(list(area = structure(c(1L, 5L, 5L, 2L, 4L, 4L, 1L,
6L, 1L, 2L, 1L, 3L, 3L, 5L, 2L, 2L, 2L, 3L, 4L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 5L, 2L, 5L, 5L, 5L, 1L, 2L, 2L, 3L, 4L, 4L, 2L, 3L,
4L, 4L, 5L, 5L, 2L, 3L, 4L, 4L, 4L, 5L), levels = c("Algebra",
"Calculus", "Geometry", "Modelling", "Probability", "Other"), class = "factor"),
score = c(10, 10, 10, 11, 11, 11, 12, 12, 13, 13, 14, 14,
14, 14, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16,
17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19,
19, 20, 20, 20, 7, 9, 9)), class = "data.frame", row.names = c(NA,
-49L))
Sorry if this isn't enough information, it's my first time on here as well.
I am able to get results from summary(lm(formula = score ~ area, data = sc.ar))
but I honestly do not know what to do with them. My goal is to find a way to >cor
by inputing the dummy variables manually
Maybe you want split
by area,
> (df_s <- split(df$score, df$area))
$Algebra
[1] 10 12 13 14 16 17
$Calculus
[1] 11 13 15 15 15 16 17 18 18 19 20
$Geometry
[1] 14 14 15 16 18 19 20
$Modelling
[1] 11 11 15 15 16 18 18 19 19 20 7 9
$Probability
[1] 10 10 14 15 16 16 17 17 17 19 19 9
$Other
[1] 12
but the areas seem to be of different length. Maybe that's just because of your toy data which you could complete with maximum lengths
.
> (m <- sapply(df_s, `length<-`, max(lengths(df_s))))
Algebra Calculus Geometry Modelling Probability Other
[1,] 10 11 14 11 10 12
[2,] 12 13 14 11 10 NA
[3,] 13 15 15 15 14 NA
[4,] 14 15 16 15 15 NA
[5,] 16 15 18 16 16 NA
[6,] 17 16 19 18 16 NA
[7,] NA 17 20 18 17 NA
[8,] NA 18 NA 19 17 NA
[9,] NA 18 NA 19 17 NA
[10,] NA 19 NA 20 19 NA
[11,] NA 20 NA 7 19 NA
[12,] NA NA NA 9 9 NA
Anyways, finally just apply cor
on the resulting matrix.
> cor(m, use="pairwise.complete.obs")
Algebra Calculus Geometry Modelling Probability Other
Algebra 1.0000000 0.9006049 0.9601136 0.9297804 0.9094441 NA
Calculus 0.9006049 1.0000000 0.8492236 0.2967773 0.9461672 NA
Geometry 0.9601136 0.8492236 1.0000000 0.9285061 0.8992441 NA
Modelling 0.9297804 0.2967773 0.9285061 1.0000000 0.5407100 NA
Probability 0.9094441 0.9461672 0.8992441 0.5407100 1.0000000 NA
Other NA NA NA NA NA NA
If you need statistics, you could use Hmisc::rcorr
.
> Hmisc::rcorr(m)
Algebra Calculus Geometry Modelling Probability Other
Algebra 1.00 0.90 0.96 0.93 0.91 NA
Calculus 0.90 1.00 0.85 0.30 0.95 NA
Geometry 0.96 0.85 1.00 0.93 0.90 NA
Modelling 0.93 0.30 0.93 1.00 0.54 NA
Probability 0.91 0.95 0.90 0.54 1.00 NA
Other NA NA NA NA NA 1
n
Algebra Calculus Geometry Modelling Probability Other
Algebra 6 6 6 6 6 1
Calculus 6 11 7 11 11 1
Geometry 6 7 7 7 7 1
Modelling 6 11 7 12 12 1
Probability 6 11 7 12 12 1
Other 1 1 1 1 1 1
P
Algebra Calculus Geometry Modelling Probability Other
Algebra 0.0143 0.0024 0.0072 0.0119
Calculus 0.0143 0.0156 0.3755 0.0000
Geometry 0.0024 0.0156 0.0025 0.0059
Modelling 0.0072 0.3755 0.0025 0.0695
Probability 0.0119 0.0000 0.0059 0.0695
Other
Warning message:
In sqrt(npair - 2) : NaNs produced
Pearson is default in both.