I need to produce a table which correlates the below data based on "Base_State" using R. How can I do this? I wrote the following R script which works on just one Base_State; this outputs it to a table.
But one of the problems is that this new table contains one additional column which is nonnumeric: Base_State.
Will homals work for linear regression? I tried using this but kept getting the error: "x must be atomic for sort.list". I also tried using the "lm" function but I couldn't figure out how to add this nonnumeric column.
Base_State M PP SP PA P
AL 40.00 4949.04 0.00 1262.51 6211.55
AL 41.00 544.00 0.00 0.00 544.00
AL 51.00 0.00 0.00 0.00 0.00
AL 92.00 6341.68 1617.77 0.00 7959.45
AL 112.00 4584.58 1169.54 0.00 5754.12
AR 52.90 32865.35 0.00 0.00 32865.35
AR 57.00 3223.67 0.00 0.00 3223.67
AR 26.80 0.00 4556.46 0.00 4556.46
AR 40.40 4920.69 1255.27 0.00 6175.96
AR 98.00 4194.39 0.00 0.00 4194.39
Table definition (in SQL)
create table #example
(
Base_State varchar(2)
,M decimal(13,2)
,PP decimal(13,2)
,SP decimal(13,2)
,PA decimal(13,2)
,P decimal(13,2)
)
insert #example values('AL', 40.00, 4949.04, 0.00, 1262.51, 6211.55)
insert #example values('AL', 41.00, 544.00, 0.00, 0.00, 544.00)
insert #example values('AL', 51.00, 0.00, 0.00, 0.00, 0.00)
insert #example values('AL', 92.00, 6341.68, 1617.77, 0.00, 7959.45)
insert #example values('AL', 112.00, 4584.58, 1169.54, 0.00, 5754.12)
insert #example values('AR', 52.90, 32865.35 ,0.00 ,0.00 ,32865.35)
insert #example values('AR', 57.00, 3223.67 ,0.00 ,0.00 ,3223.67 )
insert #example values('AR', 26.80, 0.00 ,4556.46 ,0.00 ,4556.46 )
insert #example values('AR', 40.40, 4920.69 ,1255.27 ,0.00 ,6175.96 )
insert #example values('AR', 98.00, 4194.39 ,0.00 ,0.00 ,4194.39 )
select * from #example
Note that I ran the one line of R script below before adding the first column "Base_State" and it produced the final table below. Also note that my Table definition above does not include all of the rows I used to calculate the final table below. In other words, executing "cor" on 5 rows of data won't produce these same results obviously.
M<-cor(#example)
Now I want this final table to look the same except that there would be one additional column at the beginning for Base_State. (Also, I only need the first row from this table since I want everything based on "M", but I can figure this out later).
Using the dplyr
package,
considering you have a dataframe df, you can calculate the Pearson's correlation between variable M and each one of the variables PP, SP, PA and P, grouping by Base_State:
df %>% group_by(Base_State) %>% summarise(cor_M_PP=cor(M,PP),cor_M_SP=cor(M,SP),cor_M_PA=cor(M,PA),cor_M_P=cor(M,P))