Search code examples
rpearson-correlation

How to calculate a coefficient of correlation, by year, in R and put results in a dataframe?


I want to calculate the coefficient of correlation, by year, in R and put the results in a dataframe (then repeat the process by calculating the coefficient of determination). The following code returns a value which, I'm guessing, is for all years combined. The value appears in the console but not the dataframe.

xmasCount_Amt_Coef_Correlation <- xmasCount_Amt_df_ByCheckDate %>% 
  group_by(Year.x, YTD_Range.x)

cor(xmasCount_Amt_df_ByCheckDate$n, xmasCount_Amt_df_ByCheckDate$Amount)

A sample screenshot of my source table xmasCount_Amt_df_ByCheckDate is shown below. The complete table (dataframe) contains data for 2020-2022. The desired output table looks identical to the source table which is not what I want. I'm obviously missing a step or two but am clueless as to what. Any suggestions would be appreciated.

Source table xmasCount_Amt_df_ByCheckDate


Solution

  • can you modify as needed, the below code in your project and let me know what you get:

    library(dplyr)
    
    # Group by year, then calculate corr coeff for each group
    xmasCount_Amt_Coef_Correlation <- xmasCount_Amt_df_ByCheckDate %>%
      group_by(Year.x) %>%
      summarise(correlation = cor(n, Amount))
    
    # and the result:
    xmasCount_Amt_Coef_Correlation
    

    One way you could do is (the coefficient of determination is denoted by R^2):

    # add a new column with the coefficient of determination
    xmasCount_Amt_Coef_Correlation <- xmasCount_Amt_Coef_Correlation %>%
      mutate(determination = correlation^2)
    
    # View the resulting data frame
    xmasCount_Amt_Coef_Correlation