Search code examples
rcorrelationpanel-data

Correlation Matrix Between Variables in R


I have been trying to determine the correlation between variable in panel data. My data is in the form (with more dates, some values of PM10 are NA):

structure(list(NetC = c("Cosenza Provincia", "Cosenza Provincia", 
"Cosenza Provincia", "Cosenza Provincia", "Cosenza Provincia", 
"Cosenza Provincia", "Cosenza Provincia", "Cosenza Provincia", 
"Cosenza Provincia", "Reti Private", "Reti Private", "Reti Private", 
"Reti Private", "Reti Private", "Reti Private"), ID = c("IT1938A", 
"IT1938A", "IT1938A", "IT2086A", "IT2086A", "IT2086A", "IT2110A", 
"IT2110A", "IT2110A", "IT1766A", "IT1766A", "IT1766A", "IT2090A", 
"IT2090A", "IT2090A"), Stat = c("Citta dei Ragazzi", "Citta dei Ragazzi", 
"Citta dei Ragazzi", "Rende", "Rende", "Rende", "Acri", "Acri", 
"Acri", "Firmo", "Firmo", "Firmo", "Schiavonea", "Schiavonea", 
"Schiavonea"), Data = c("1/1/2022", "1/2/2022", "1/3/2022", "1/1/2022", 
"1/2/2022", "1/3/2022", "1/1/2022", "1/2/2022", "1/3/2022", "1/1/2022", 
"1/2/2022", "1/3/2022", "1/1/2022", "1/2/2022", "1/3/2022"), 
    PM10 = c(13.29, 11.14, 9.08, 16.62, 12.98, 10.4, 16.2, 19.4, 
    15.7, 10.82, 12.29, 9.54, 24.54, 22.88, 27.33)), class = "data.frame", row.names = c(NA, 
-15L))

I have tried using plm::cortab, but it doesn't calculate the correlation.

library(plm)
cortab(data$PM10, grouping = Stat, groupnames = c("Citta dei Ragazzi", "Rende", 
                                                  "Acri", "Firmo", "Schiavonea"))

The output should look like:

Citta dei Ragazzi Rende Acri
Citta dei Ragazzi 1
Rende x 1
Acri x x 1

Solution

  • This has pretty much already been asked (How can I complete a correlation in R of one variable across it's factor levels, matching by date) but for ease I have adapted that answer here for your use:

    # simple correlation matrix:
    data.wider <- data %>% 
      select(-ID, -NetC) %>% # remove unnecessary vars 
      pivot_wider(names_from = 'Stat', values_from = 'PM10')
    
    cor(data.wider[,-1], use = 'p')  
    
    # more lines required to set up correlation testing:
    pw <- combn(unique(data$Stat),2) # make pairwise sets
    pw
    
    pairwise_c <- apply(pw,2,function(i){
      tidy(cor.test(data.wider[[i[1]]],data.wider[[i[2]]]))
    })
    
    results <- cbind(data.frame(t(pw)),bind_rows(pairwise_c))
    
    results