I have two bibliographic datasets A & B (.bib files, WoS export, full record & cited references). Both of them contain relevant and irrelevant results. The first dataset A has been cleaned so that I have the relevant results A(r) and irrelevant results A(i) as two different datasets (.bib files). The second dataset B encompasses my first dataset A completely. visualisation of my two datasets
Goal: I am looking for a way to remove the irrelevant results A(i), which I have already identified in my first dataset, from my second dataset B.
Approach: If I were to merge the datasets B & A(i) I could trace the irrelevant results A(i) in B by using a remove duplicate function since A(i) would occur twice in B. However, this would only remove the duplicates of A(i) and not all instances of A(i).
Functions to remove duplicats:
package revtools
matches <- find_duplicates(data, match_variable = "title")
data_unique <- extract_unique_references(data, matches)
package bibliometrix
duplicatedMatching(M, Field = "TI", tol = 0.95)
•Q1: Is there a way to remove all instances of duplicates (the duplicates and the originals) identified through a find/remove duplicate function?
•Q2: Is there a better way for removing A(i) from B? i.e. remove all instances of duplicates in a dataset
•Q3: More generally asking: can I search for a larger amount of specific bibliographic data in my dataset (a list of papers) and remove it from that dataset?
Thank you so much for your help!
You can use match
to find identical title
in two data sets.
#remove Ai from B
B[-match(unique(Ai$title), B$title),]
# title misc
#1 a X
#2 b X
#5 e X
#7 g X
#remove Ai and Ar from B
B[-match(unique(c(Ai$title, Ar$title)), B$title),]
# title misc
#7 g X
Data:
Ar <- data.frame(title=c("a", "b", "e"), misc="X", stringsAsFactors = FALSE)
Ai <- data.frame(title=c("d", "c", "f"), misc="X", stringsAsFactors = FALSE)
B <- data.frame(title=letters[1:7], misc="X", stringsAsFactors = FALSE)