I have a grouped data structure (different households answering a weekly opinion poll) and I observe them over 52 weeks (in the example below four weeks). Now I want to indicate the value of a household at a given point in time using the gini coefficient. In this case, the value of a household participating in the poll should be higher, if the household didn't participate in the past weeks. So a household always answering the poll should have a lower gini coefficient in a given week than a household answering every 4 weeks.
The data structure is as follows:
da_poll <- data.frame(household = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), week = c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4), participation = c(1,1,1,1,0,0,0,1,0,1,0,1,1,1,1,0))
da_poll
household week participation
1 1 1 1
2 1 2 1
3 1 3 1
4 1 4 1
5 2 1 0
6 2 2 0
7 2 3 0
8 2 4 1
9 3 1 0
10 3 2 1
11 3 3 0
12 3 4 1
13 4 1 1
14 4 2 1
15 4 3 1
16 4 4 0
1 indicates participation, 0 no participation.
Here are three ways. They all use function Gini
in package DescTools
.
library(DescTools)
Base R
tapply(da_poll$participation, da_poll$household, Gini)
# 1 2 3 4
#0.0000000 1.0000000 0.6666667 0.3333333
Or, another base R way.
aggregate(participation ~ household, da_poll, Gini)
# household participation
#1 1 0.0000000
#2 2 1.0000000
#3 3 0.6666667
#4 4 0.3333333
dplyr
library(dplyr)
da_poll %>%
group_by(household) %>%
summarise(gini = Gini(participation))
## A tibble: 4 x 2
# household gini
# <dbl> <dbl>
#1 1 0
#2 2 1
#3 3 0.667
#4 4 0.333
Edit.
To have one Gini
coefficient value per row of the original data set, not an aggregate, use ave
intead of tapply
and mutate
instead of summarise
.
With base R
da_poll$gini <- ave(da_poll$participation, da_poll$household, FUN = Gini)
dplyr
solution
da_poll %>%
group_by(household) %>%
mutate(gini = Gini(participation))