Search code examples
rquantile

Doing quantiles per group


so I'm a total newbie with R, and in one of our final assignments, we want to do quantiles per country and per column on this data.

We have tried to do it with the apply function, with a loop, but we have not been able to crack it yet:

    Ano                                           Paises      Males.total
1 2011                                          Belgium        19.5
2 2011                                         Bulgaria        46.4
3 2011                                          Czechia        11.9
4 2011                                          Denmark        17.5
5 2011 Germany (until 1990 former territory of the FRG)        18.5
6 2011                                          Estonia        22.9
  Females.total Malessinterminar Females.sin.terminar malespostsecundaria
1            21               33                 34.3                16.7
2          49.7             72.1                 75.1                42.1
3          16.4             32.3                 28.6                11.2
4          17.9             24.6                 24.4                16.8
5          21.3             38.5                 34.7                21.5
6          22.5               34                 35.4                24.3
  Femalespostsecundaria Males.universidad Femalesuniversidad
1                    19              10.6               10.1
2                  45.4              17.1               24.9
3                  15.7               4.1                5.4
4                  17.8              11.9               12.1
5                  21.5              10.3               13.4
6                    27              10.5               10.7

We have tried this loop, that we would like to do with each column of data by country. The thing is that this operation gives us more that one result, so the loop doesn't compute it:

estadosunicos<-unique(paises)
resultados<-matrix(0,length(estadosunicos),ncol = 3)
for (i in 1:length(estadosunicos)){
  selec<-estadosunicos[i]
  resultados[i,1]<-males.sin.terminar[paises==estadosunicos][females.sin.terminar<quantile(females.sin.terminar, 0.25)]
  resultados[i,2]<-males.sin.terminar[paises==estadosunicos][males.sin.terminar>quantile(males.sin.terminar,0.25)& males.sin.terminar<quantile(males.sin.terminar,0.75)]
  resultados[i,3]<-males.sin.terminar[paises==estadosunicos][males.sin.terminar>quantile(males.sin.terminar,0.75)]
}
rownames(resultados)<-estadosunicos

So we don't know how to do this. we would like to get the 25%, 50% and 75% of these data by country, but we have more than 300 rows of information so the countries are repeated several times through the different years. How can we do it? Thank you so much for your help!


Solution

  • We can do a group by operation and then get the quantile on each of those numeric columns by looping across the columns and then return a list object which can be converted to columns with unnest_wider etc.

    library(dplyr)
    df1 %>%
      select(-Ano) %>% 
      group_by(paises) %>% 
      summarise(across(where(is.numeric), ~ 
         list(as.list(quantile(.x, prob = c(.25, 0.5, 0.75)))))