Search code examples
rcorrelationrankingsurvey

Code to analyze relationships between responses to different ranking questions on a survey


My goal is to find much simpler code, which can generalize, that shows the relationships between responses to two survey questions. In the MWE, one question asked respondents to rank eight marketing selections from 1 to 8 and the other asked them to rank nine attribute selections from 1 to 9. Higher rankings indicate the respondent favored the selection more. Here is the data frame.

structure(list(Email = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i"), Ads = c(2, 1, 1, 1, 1, 2, 1, 1, 1), Alumni = c(3, 2, 2, 
3, 2, 3, 2, 2, 2), Articles = c(6, 4, 3, 2, 3, 4, 3, 3, 3), Referrals = c(4, 
3, 4, 8, 7, 8, 8, 6, 4), Speeches = c(7, 7, 6, 7, 4, 7, 4, 5, 
5), Updates = c(8, 6, 6, 5, 5, 5, 5, 7, 6), Visits = c(5, 8, 
7, 6, 6, 6, 6, 4, 8), `Business Savvy` = c(10, 6, 10, 10, 4, 
4, 6, 8, 9), Communication = c(4, 3, 8, 3, 3, 9, 7, 6, 7), Experience = c(7, 
7, 7, 9, 2, 8, 5, 9, 5), Innovation = c(2, 1, 4, 2, 1, 2, 2, 
1, 1), Nearby = c(3, 2, 2, 1, 5, 3, 3, 2, 2), Personal = c(8, 
10, 6, 8, 6, 10, 4, 3, 3), Rates = c(9, 5, 9, 6, 9, 7, 10, 5, 
4), `Staffing Model` = c(6, 8, 5, 5, 7, 5, 8, 7, 8), `Total Cost` = c(5, 
4, 3, 7, 8, 6, 9, 4, 6)), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame"))

If numeric rankings cannot be used for my solution to calculating relationships (correlations), please correct me.

Hoping they can be used, I arrived at the following plodding code, which I hope calculates the correlation matrix of each method selection against each attribute selection.

library(psych)

dataframe2 <- psych::corr.test(dataframe[  , c(2, 9:17)])[[1]][1:10]  # the first method vs all attributes
dataframe3 <- psych::corr.test(dataframe[  , c(3, 9:17)])[[1]][1:10]  # the 2nd method vs all attributes and so on
dataframe4 <- psych::corr.test(dataframe[  , c(4, 9:17)])[[1]][1:10]  
dataframe5 <- psych::corr.test(dataframe[  , c(5, 9:17)])[[1]][1:10]
dataframe6 <- psych::corr.test(dataframe[  , c(6, 9:17)])[[1]][1:10]  
dataframe7 <- psych::corr.test(dataframe[  , c(7, 9:17)])[[1]][1:10]
dataframe8 <- psych::corr.test(dataframe[  , c(8, 9:17)])[[1]][1:10]

# create a dataframe from the rbinded rows
bind <- data.frame(rbind(dataframe2, dataframe3, dataframe4, dataframe5, dataframe6, dataframe7, dataframe8))

Rename rows and columns:

colnames(bind) <- c("Sel", colnames(dataframe[9:17]))
rownames(bind) <- colnames(dataframe[2:8])

How can I accomplish the above more efficiently?

By the way, the bind data frame also allows one to produce a heat map with the DataExplorer package.

library(DataExplorer)

DataExplorer::plot_correlation(bind)

heat map of ranking correlations


Solution

  • [Summary]

    In the scope of our discussion, there are two ways to get the correlation data.

    1. Use stats::cor, i.e., cor(subset(dataframe, select = -Email))
    2. Use psych::corr.test, i.e., corr.test(subset(dataframe, select = -Email))[[1]]

    Then you may subset the correlation matrix with the desired rows and columns.

    In order to use DataExplorer::plot_correlation, you can simply do plot_correlation(dataframe, type = "c"). Note: the output heatmap will include correlations for all columns, so you can just ignore columns that are not of interests.


    [Original Answer]

    ## Create data
    dataframe <- structure(
      list(
        Email = c("a", "b", "c", "d", "e", "f", "g", "h",  "i"),
        Ads = c(2, 1, 1, 1, 1, 2, 1, 1, 1),
        Alumni = c(3, 2, 2, 3, 2, 3, 2, 2, 2),
        Articles = c(6, 4, 3, 2, 3, 4, 3, 3, 3),
        Referrals = c(4, 3, 4, 8, 7, 8, 8, 6, 4),
        Speeches = c(7, 7, 6, 7, 4, 7, 4, 5, 5),
        Updates = c(8, 6, 6, 5, 5, 5, 5, 7, 6),
        Visits = c(5, 8, 7, 6, 6, 6, 6, 4, 8),
        `Business Savvy` = c(10, 6, 10, 10, 4, 4, 6, 8, 9),
        Communication = c(4, 3, 8, 3, 3, 9, 7, 6, 7),
        Experience = c(7, 7, 7, 9, 2, 8, 5, 9, 5),
        Innovation = c(2, 1, 4, 2, 1, 2, 2, 1, 1),
        Nearby = c(3, 2, 2, 1, 5, 3, 3, 2, 2),
        Personal = c(8, 10, 6, 8, 6, 10, 4, 3, 3),
        Rates = c(9, 5, 9, 6, 9, 7, 10, 5, 4),
        `Staffing Model` = c(6, 8, 5, 5, 7, 5, 8, 7, 8),
        `Total Cost` = c(5, 4, 3, 7, 8, 6, 9, 4, 6)
      ),
      row.names = c(NA, -9L),
      class = c("tbl_df", "tbl", "data.frame")
    )
    

    Following your example strictly, we can do the following:

    ## Calculate correlation
    df2 <- subset(dataframe, select = -Email)
    marketing_selections <- names(df2)[1:7]
    attribute_selections <- names(df2)[8:16]
    corr_matrix <- psych::corr.test(df2)[[1]]
    bind <- subset(corr_matrix,
                   subset = rownames(corr_matrix) %in% marketing_selections,
                   select = attribute_selections)
    DataExplorer::plot_correlation(bind)
    

    Correlation Heatmap

    WARNING

    However, is this what you really want? psych::corr.test generates the correlation matrix, and DataExplorer::plot_correlation calculates the correlation again. It is like the correlation of the correlation.