I have a data frame called income.df that looks something like this:
ID region income
1 rot 3700
2 ams 2500
3 utr 3300
4 utr 5300
5 utr 4400
6 ams 3100
8 ams 3000
9 rot 4000
10 rot 4400
12 rot 2000
I want to use the Gini function to compute the Gini coefficient for each region. If I wanted to compute it for the whole dataframe, without taking region into account, I would do the following:
library(DescTools)
Gini(income.df$income, n = rep(1, length(income.df$income)), unbiased = TRUE, conf.level = NA, R = 1000, type = "bca", na.rm = TRUE)
Is there a way to do this for each region within the dataframe? So in this case for "rot", "utr", and "ams"? Note that the Gini function also needs the length of the vector in there (which would be 4, 3, and 3 for the three regions respectively). I suspect something like lapply could do this, but I couldn't figure out how to automatically pass those lengths within the function (my actual dataframe is a lot larger, so manually is not an option).
Using Base R:
library(DescTools)
lapply(split(df,df$region),
function(x) (Gini(x$income, n = rep(1, length(x$income)), unbiased = TRUE,
conf.level = NA, R = 1000, type = "bca", na.rm = TRUE)))
Using tidyverse:
library(tidyverse)
library(DescTools)
df %>% group_by(region) %>% nest() %>%
mutate(gini_coef = map(data, ~Gini(.x$income, n = rep(1, length(.x$income)),
unbiased = TRUE, conf.level = NA, R = 1000, type = "bca", na.rm = TRUE))) %>%
select(-data) %>% unnest() %>% left_join(df)
Joining, by = "region"
# A tibble: 10 x 4
region gini_coef ID income
<fct> <dbl> <int> <int>
1 rot 0.177 1 3700
2 rot 0.177 9 4000
3 rot 0.177 10 4400
4 rot 0.177 12 2000
5 ams 0.0698 2 2500
6 ams 0.0698 6 3100
7 ams 0.0698 8 3000
8 utr 0.154 3 3300
9 utr 0.154 4 5300
10 utr 0.154 5 4400
df <- read.table(text="
ID region income
1 rot 3700
2 ams 2500
3 utr 3300
4 utr 5300
5 utr 4400
6 ams 3100
8 ams 3000
9 rot 4000
10 rot 4400
12 rot 2000
",header=T)