I have a dataframe df
in which I have data about percentages (df$Percentage
) of different habitats(df$Habitat
) for specific tracks (df$Replica
) in different sites (df$Site
). As an example:
df <- data.frame(Site = c("A","A","A","A","A","B","B","B","B","B","B"),
Replica = c(1,1,1,2,2,1,1,1,2,2,2),
Habitat = c("X","Y","Z","X","Y","X","Y","M","X","M","Z"),
Porcentage = c(46,38,16,40,60,20,60,20,35,55,10))
df
Site Replica Habitat Porcentage
1 A 1 X 46
2 A 1 Y 38
3 A 1 Z 16
4 A 2 X 40
5 A 2 Y 60
6 B 1 X 20
7 B 1 Y 60
8 B 1 M 20
9 B 2 X 35
10 B 2 M 55
11 B 2 Z 10
Here, for example, the percentage of habitat X
at site A
is 46, while there is no habitat M
(which is present in site B
). The sum of all values for a specific Replica
within a site is 100
. For example, in site A
, for Replica == 2
, habitat X
and Y
sum 100, which means there is no other habitat present in this replica/track.
I want to calculate both the mean percentage (Mean_Percentage
) of each habitat per Site
and the standard error (SE) of the mean. The mean is calculated using Replica
, since for each Site
I have repeated measures (= Replica
).
df %>% group_by(Site,Habitat) %>% summarise(Mean_Porcentage = mean(Porcentage),SE = sd(Porcentage)/sqrt(length(Porcentage)))
# A tibble: 7 × 4
# Groups: Site [2]
Site Habitat Mean_Porcentage SE
<chr> <chr> <dbl> <dbl>
1 A X 43 3
2 A Y 49 11
3 A Z 16 NA
4 B M 37.5 17.5
5 B X 27.5 7.5
6 B Y 60 NA
7 B Z 10 NA
The problem is that Mean_Porcentage
and thus, SE
are not calculated properly. For example, habitat Z
is present in Replica == 1
(=16%) but not in Replica == 2
(=0%) in site A
. Thus, the mean percentage (Mean_Porcentage
) of Z
in site A
should be 8
(=[16%+0%]/2) and SE should be 8
. Also, habitat M
is not present in site A
but it is in site B
, so I want habitat M
to appear for site A
with a Mean_Percentage
of 0
.
My desired result should be a dataframe as this:
df2
Site Habitat Mean_Porcentage SE
1 A M 0.0 0
2 A X 43.0 3
3 A Y 49.0 11
4 A Z 8.0 8
5 B M 37.5 17.5
6 B X 27.5 7.5
7 B Y 30.0 30
8 B Z 5.0 5
Does anyone know how to do it?
Thanks in advance!
You can try xtabs
and proportions
.
proportions(xtabs(Porcentage ~ Site + Habitat, df), 1) * 100
# Habitat
#Site M X Y Z
# A 0.0 43.0 49.0 8.0
# B 37.5 27.5 30.0 5.0
as.data.frame(proportions(xtabs(Porcentage ~ Site + Habitat, df), 1) * 100)
# Site Habitat Freq
#1 A M 0.0
#2 B M 37.5
#3 A X 43.0
#4 B X 27.5
#5 A Y 49.0
#6 B Y 30.0
#7 A Z 8.0
#8 B Z 5.0
Adding the se as described in the question.
x <- as.data.frame(proportions(xtabs(Porcentage ~ Site + Habitat, df), 1) * 100)
x <- merge(x, aggregate(cbind(SE = Porcentage) ~ Site + Habitat, df, \(x) sd(x)/sqrt(length(x))), all.x=TRUE)
i <- is.na(x$SE)
x$SE[i] <- x$Freq[i]
x
# Site Habitat Freq SE
#1 A M 0.0 0.0
#2 A X 43.0 3.0
#3 A Y 49.0 11.0
#4 A Z 8.0 8.0
#5 B M 37.5 17.5
#6 B X 27.5 7.5
#7 B Y 30.0 30.0
#8 B Z 5.0 5.0