Search code examples
rtransformpanelgini

Gini coefficient in panel data


I have a grouped data structure (different households answering a weekly opinion poll) and I observe them over 52 weeks (in the example below four weeks). Now I want to indicate the value of a household at a given point in time using the gini coefficient. In this case, the value of a household participating in the poll should be higher, if the household didn't participate in the past weeks. So a household always answering the poll should have a lower gini coefficient in a given week than a household answering every 4 weeks.

The data structure is as follows:


    da_poll <- data.frame(household = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), week = c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4), participation = c(1,1,1,1,0,0,0,1,0,1,0,1,1,1,1,0))
    da_poll
       household week participation
    1          1    1             1
    2          1    2             1
    3          1    3             1
    4          1    4             1
    5          2    1             0
    6          2    2             0
    7          2    3             0
    8          2    4             1
    9          3    1             0
    10         3    2             1
    11         3    3             0
    12         3    4             1
    13         4    1             1
    14         4    2             1
    15         4    3             1
    16         4    4             0

1 indicates participation, 0 no participation.


Solution

  • Here are three ways. They all use function Gini in package DescTools.

    library(DescTools)
    

    Base R

    tapply(da_poll$participation, da_poll$household, Gini)
    #        1         2         3         4 
    #0.0000000 1.0000000 0.6666667 0.3333333 
    

    Or, another base R way.

    aggregate(participation ~ household, da_poll, Gini)
    #  household participation
    #1         1     0.0000000
    #2         2     1.0000000
    #3         3     0.6666667
    #4         4     0.3333333
    

    dplyr

    library(dplyr)
    
    da_poll %>% 
      group_by(household) %>%
      summarise(gini = Gini(participation))
    ## A tibble: 4 x 2
    #  household  gini
    #      <dbl> <dbl>
    #1         1 0    
    #2         2 1    
    #3         3 0.667
    #4         4 0.333
    

    Edit.

    To have one Gini coefficient value per row of the original data set, not an aggregate, use ave intead of tapply and mutate instead of summarise.

    With base R

    da_poll$gini <- ave(da_poll$participation, da_poll$household, FUN = Gini)
    

    dplyr solution

    da_poll %>% 
      group_by(household) %>%
      mutate(gini = Gini(participation))