I have a dataframe of many variables (soil properties) for 11 legumes in 2 different locations. first few columns of the data is shown below.
SPECIES LOCATION pH NO3 NH4 P Organic_C K Cu Mn Zn BD X.Sand X.Silt X.Clay
1 C. comosa Gauteng 5.40 8.24 1.35 1.10 0.95 94.40 3.36 84.40 4.72 1.45 68.0 12 9
2 C. comosa Gauteng 5.25 8.36 1.37 1.20 0.99 94.87 3.39 84.87 4.77 1.36 76.0 16 13
3 C. comosa Gauteng 5.55 8.19 1.32 1.11 0.94 94.01 3.35 84.01 4.68 1.54 78.0 14 14
4 C. comosa Mpumalanga 5.84 4.05 3.46 3.04 1.55 130.40 0.28 25.43 2.00 1.66 73.6 9 10
5 C. comosa Mpumalanga 5.49 4.45 3.48 3.09 1.53 131.36 0.27 25.35 2.12 1.45 76.5 11 16
6 C. comosa Mpumalanga 6.19 4.43 3.44 3.04 1.58 129.95 0.29 25.45 2.14 1.87 74.9 13 16
7 C. distans Gauteng 5.48 8.88 1.96 3.33 0.99 130.24 0.99 40.01 3.94 1.55 70.0 8 11
8 C. distans Gauteng 5.29 8.54 1.99 3.28 0.99 130.28 0.95 40.25 3.89 1.48 79.0 12 15
9 C. distans Gauteng 5.67 8.63 1.93 3.39 1.02 130.30 0.98 40.12 3.97 1.62 79.0 10 16
10 C. distans Mpumalanga 5.61 6.02 2.65 4.45 2.58 163.25 1.79 53.11 6.11 1.68 72.0 8 10
11 C. distans Mpumalanga 5.43 6.58 2.55 4.49 2.59 163.55 1.78 52.89 6.04 1.63 78.0 15 14
12 C. distans Mpumalanga 5.79 6.24 2.59 4.41 2.59 163.27 1.75 53.03 6.19 1.73 75.0 16 12
13 E. cordatum Gauteng 4.38 16.29 5.76 4.77 3.25 175.38 1.11 35.87 8.54 1.53 33.0 9 40
14 E. cordatum Gauteng 4.05 16.15 5.63 4.73 3.29 175.90 1.23 34.34 8.61 1.42 45.0 13 50
15 E. cordatum Gauteng 4.71 15.89 5.99 4.80 3.25 174.54 1.19 36.44 8.58 1.64 42.0 14 54
for each location, i want to summarize the data such that the soil properties are in the first column while the legumes are the other columns and each value reported as the mean ± SD. something like the table below.
i am thinking dcast
but i am not sure how to get the values as mean ± SD
In Base R this gives you the exact output, But you lose the ability to (easily) extract the numbers afterwards. There are better ways of storing the data if you want to do more with it afterwards.
as.data.frame(t(aggregate(. ~ LOCATION, df[,-1], function(x) paste(round(mean(x),2),"±", round(sd(x),2)))))
V1 V2
LOCATION Gauteng Mpumalanga
pH 5.09 ± 0.57 5.72 ± 0.28
NO3 11.02 ± 3.83 5.29 ± 1.1
NH4 3.03 ± 2.09 3.03 ± 0.47
P 3.08 ± 1.58 3.75 ± 0.76
Organic_C 1.74 ± 1.14 2.07 ± 0.57
K 133.32 ± 35.08 146.96 ± 17.96
Cu 1.84 ± 1.15 1.03 ± 0.82
Mn 53.37 ± 23.39 39.21 ± 15.12
Zn 5.74 ± 2.15 4.1 ± 2.21
BD 1.51 ± 0.09 1.67 ± 0.14
X.Sand 63.33 ± 18.18 75 ± 2.11
X.Silt 12 ± 2.6 12 ± 3.22
X.Clay 24.67 ± 17.99 13 ± 2.76
Edit:
If you want the results for all species/locations you can use
as.data.frame(t(aggregate(. ~ SPECIES + LOCATION, df, function(x) paste(round(mean(x),2),"±", round(sd(x),2)))))
V1 V2 V3 V4 V5
SPECIES C. comosa C. distans E. cordatum C. comosa C. distans
LOCATION Gauteng Gauteng Gauteng Mpumalanga Mpumalanga
pH 5.4 ± 0.15 5.48 ± 0.19 4.38 ± 0.33 5.84 ± 0.35 5.61 ± 0.18
NO3 8.26 ± 0.09 8.68 ± 0.18 16.11 ± 0.2 4.31 ± 0.23 6.28 ± 0.28
NH4 1.35 ± 0.03 1.96 ± 0.03 5.79 ± 0.18 3.46 ± 0.02 2.6 ± 0.05
P 1.14 ± 0.06 3.33 ± 0.06 4.77 ± 0.04 3.06 ± 0.03 4.45 ± 0.04
Organic_C 0.96 ± 0.03 1 ± 0.02 3.26 ± 0.02 1.55 ± 0.03 2.59 ± 0.01
K 94.43 ± 0.43 130.27 ± 0.03 175.27 ± 0.69 130.57 ± 0.72 163.36 ± 0.17
Cu 3.37 ± 0.02 0.97 ± 0.02 1.18 ± 0.06 0.28 ± 0.01 1.77 ± 0.02
Mn 84.43 ± 0.43 40.13 ± 0.12 35.55 ± 1.09 25.41 ± 0.05 53.01 ± 0.11
Zn 4.72 ± 0.05 3.93 ± 0.04 8.58 ± 0.04 2.09 ± 0.08 6.11 ± 0.08
BD 1.45 ± 0.09 1.55 ± 0.07 1.53 ± 0.11 1.66 ± 0.21 1.68 ± 0.05
X.Sand 74 ± 5.29 76 ± 5.2 40 ± 6.24 75 ± 1.45 75 ± 3
X.Silt 14 ± 2 10 ± 2 12 ± 2.65 11 ± 2 13 ± 4.36
X.Clay 12 ± 2.65 14 ± 2.65 48 ± 7.21 14 ± 3.46 12 ± 2
And if you really only want one location you can use.
as.data.frame(t(aggregate(. ~ SPECIES,df[df$LOCATION == "Gauteng",-2], function(x) paste(round(mean(x),2),"±", round(sd(x),2)))))
V1 V2 V3
SPECIES C. comosa C. distans E. cordatum
pH 5.4 ± 0.15 5.48 ± 0.19 4.38 ± 0.33
NO3 8.26 ± 0.09 8.68 ± 0.18 16.11 ± 0.2
NH4 1.35 ± 0.03 1.96 ± 0.03 5.79 ± 0.18
P 1.14 ± 0.06 3.33 ± 0.06 4.77 ± 0.04
Organic_C 0.96 ± 0.03 1 ± 0.02 3.26 ± 0.02
K 94.43 ± 0.43 130.27 ± 0.03 175.27 ± 0.69
Cu 3.37 ± 0.02 0.97 ± 0.02 1.18 ± 0.06
Mn 84.43 ± 0.43 40.13 ± 0.12 35.55 ± 1.09
Zn 4.72 ± 0.05 3.93 ± 0.04 8.58 ± 0.04
BD 1.45 ± 0.09 1.55 ± 0.07 1.53 ± 0.11
X.Sand 74 ± 5.29 76 ± 5.2 40 ± 6.24
X.Silt 14 ± 2 10 ± 2 12 ± 2.65
X.Clay 12 ± 2.65 14 ± 2.65 48 ± 7.21