Search code examples
rmatrixsocial-networkingbibliography

Bibliometrix package: how to link co-citation reference clusters back to original citing documents?


I have completed a co-citation analysis using the R package bibliometrix, as in this example:

library(bibliometrix)
data(scientometrics, package = "bibliometrixData")
M <- scientometrics
NetMatrix <- biblioNetwork(M, analysis = "co-citation", network = "references", sep = "; ")

net=networkPlot(NetMatrix, n = 30, Title = "Co-Citation Network", type = "fruchterman", size=T,
                remove.multiple=FALSE, labelsize=0.7,edgesize = 5)

There are three communities/clusters:

plot(net$graph)

The co-cited references (i.e., the nodes) in each cluster can be accessed in several ways, e.g.,:

net$cluster_obj[1] #cluster 1
net$cluster_obj[2] #cluster 2
net$cluster_obj[3] #cluster 3

The resulting listed citation names from each cluster are in total the colnames and rownames found in NetMatrix, with the cells of this matrix representing the number of original documents that co-cite the pair of references. NetMatrix in this case is the result of [A'] x [A], where the rows of A refer to the original citing documents (rownames(M)) and the columns of A refer to the references of those documents (M$CR or names(net$nodeDegree)).

My question: is there a way to link/refer the co-cited references within these clusters back to the original documents citing them (in the M dataframe), so that I could retrieve a list of M$TI or rownames(M) for the original documents which co-cited the references in cluster 1, 2, and 3, respectively?


Solution

  • Yes, you can, but it needs a bit or 'acrobatics':

    First extract cluster's references and do some cleaning:

    library(dplyr)
    library(stringr)
    cluster1 <- tibble(ref = net$cluster_obj[1] %>% unlist() %>% toupper(),
                       ID = seq(1:length(ref))) %>% mutate(part = str_extract(ref, "^\\D+") %>% str_trim(),
                                                           part2 = str_extract(ref,"(\\d)+"),
                                                           reference = paste0(part,", ",part2)) %>% 
      select(ID, reference, ref) 
    

    Now you have references ready to do the search. I will do a sample search with the first reference:

    sample <-M %>% filter(grepl(cluster1$reference[1], CR))
    

    You can iterate over the references in a cluster1 table and put results in a common dataframe.