Search code examples
rcitations

Calculating the academic "g-index" (a variant of the h-index) in R?


This dataframe shows two researchers and their citation counts per publication:

   researcher citations
   <chr>          <dbl>
 1 Berger             8
 2 Berger            11
 3 Berger            26
 4 Berger            25
 5 Berger            10
 6 Meyer             45
 7 Meyer             12
 8 Meyer             12
 9 Meyer              8
10 Meyer             21

How can I calculate the "g-index" of each researcher in R?

This is the Wikipedia definition of the g-index:

The index is calculated based on the distribution of citations received by a given researcher's publications, such that given a set of articles ranked in decreasing order of the number of citations that they received, the g-index is the unique largest number such that the top g articles received together at least g2 citations. Hence, a g-index of 10 indicates that the top 10 publications of an author have been cited at least 100 times (102), a g-index of 20 indicates that the top 20 publications of an author have been cited 400 times (202).

enter image description here

The dataframe:

structure(list(researcher = c("Berger", "Berger", "Berger", "Berger", 
"Berger", "Meyer", "Meyer", "Meyer", "Meyer", "Meyer"), citations = c(8, 
11, 26, 25, 10, 45, 12, 12, 8, 21)), row.names = c(NA, -10L), groups = structure(list(
    researcher = c("Berger", "Meyer"), .rows = structure(list(
        1:5, 6:10), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

Solution

  • With data.table:

    setorder(dt, researcher, -citations)
    dtg <- dt[, .(gscore = max((1:.N)*(cumsum(citations) > (1:.N)))), by = "researcher"]
    dtg
    #>    researcher gscore
    #> 1:     Berger      5
    #> 2:      Meyer      5