Search code examples
rlinear-regressionpearson-correlation

Multiple Linear Regression to table using Pearsons


I need to produce a table which correlates the below data based on "Base_State" using R. How can I do this? I wrote the following R script which works on just one Base_State; this outputs it to a table.

But one of the problems is that this new table contains one additional column which is nonnumeric: Base_State.

Will homals work for linear regression? I tried using this but kept getting the error: "x must be atomic for sort.list". I also tried using the "lm" function but I couldn't figure out how to add this nonnumeric column.

Base_State  M   PP  SP  PA  P
AL  40.00   4949.04 0.00    1262.51 6211.55
AL  41.00   544.00  0.00    0.00    544.00
AL  51.00   0.00    0.00    0.00    0.00
AL  92.00   6341.68 1617.77 0.00    7959.45
AL  112.00  4584.58 1169.54 0.00    5754.12
AR  52.90   32865.35    0.00    0.00    32865.35
AR  57.00   3223.67 0.00    0.00    3223.67
AR  26.80   0.00    4556.46 0.00    4556.46
AR  40.40   4920.69 1255.27 0.00    6175.96
AR  98.00   4194.39 0.00    0.00    4194.39

Table definition (in SQL)

create table #example
(
Base_State varchar(2)
,M decimal(13,2)
,PP decimal(13,2)
,SP decimal(13,2)
,PA decimal(13,2)
,P decimal(13,2)
)

insert #example values('AL',    40.00,  4949.04,    0.00,       1262.51,    6211.55)
insert #example values('AL',    41.00,  544.00,     0.00,       0.00,       544.00)
insert #example values('AL',    51.00,  0.00,       0.00,       0.00,       0.00)
insert #example values('AL',    92.00,  6341.68,    1617.77,    0.00,       7959.45)
insert #example values('AL',    112.00, 4584.58,    1169.54,    0.00,       5754.12)

insert #example values('AR',    52.90,  32865.35    ,0.00       ,0.00   ,32865.35)
insert #example values('AR',    57.00,  3223.67     ,0.00       ,0.00   ,3223.67 )
insert #example values('AR',    26.80,  0.00        ,4556.46    ,0.00   ,4556.46 )
insert #example values('AR',    40.40,  4920.69     ,1255.27    ,0.00   ,6175.96 )
insert #example values('AR',    98.00,  4194.39     ,0.00       ,0.00   ,4194.39 )

select * from #example

Note that I ran the one line of R script below before adding the first column "Base_State" and it produced the final table below. Also note that my Table definition above does not include all of the rows I used to calculate the final table below. In other words, executing "cor" on 5 rows of data won't produce these same results obviously.

M<-cor(#example)

Now I want this final table to look the same except that there would be one additional column at the beginning for Base_State. (Also, I only need the first row from this table since I want everything based on "M", but I can figure this out later).

enter image description here


Solution

  • Using the dplyr package, considering you have a dataframe df, you can calculate the Pearson's correlation between variable M and each one of the variables PP, SP, PA and P, grouping by Base_State:

    df %>% group_by(Base_State) %>% summarise(cor_M_PP=cor(M,PP),cor_M_SP=cor(M,SP),cor_M_PA=cor(M,PA),cor_M_P=cor(M,P))