Search code examples
runiquebibtexgrepl

Using grepl in R to match family and given names from list of co-authors


I am trying to match unique authors from a bibTEX file in R using grepl(), but I'm having trouble getting it to match both the 'given' and 'family' names (rather than just one or the other. Family name alone would be fine, but my bibliography has multiple authors with the same family name.

My input file (e.g.) is dat.bib:

@article{ test1,
Author = {Williams, Kate and Williams, Jeff},
Title = {{Test1}},
Journal = {{Testy}},
Year = {{2010}},
}

@article{ test2,
Author = {Williams, Leroy and Williams, Rory},
Title = {{Test2}},
Journal = {{Testy}},
Year = {{2010}},
}

And now what I've tried in R

test <- read.bib("C/....dat.bib")
authors<- lapply(test, function(x) x$author)

gives:

$test1
[1] "Kate Williams" "Jeff Williams"

$test2
[1] "Leroy Williams" "Rory Williams" 

I can't use the 'authors' results alone, because I'm attempting a co-author analysis and this will return the same author as separate results if they have co-authored on multiple papers.

I've tried matching the unique authors:

unique.authors <- unique((unlist(authors))[grepl('family', names(unlist(authors)),ignore.case=TRUE)])

Which returns:

[1] "Williams"

and

 unique.authors <- unique((unlist(authors))[grepl('given', names(unlist(authors)),ignore.case=TRUE)])

returns:

[1] "Kate" "Jeff" "Leroy" "Rory".

But what I want is for unique authors to return

"Kate Williams" "Jeff Williams" "Leroy Williams" "Rory Williams"

I've tried binding the 'family' and 'given arguments together

x <- c("family", "given")
unique.authors <- unique((unlist(authors))[grepl(x, names(unlist(authors)))])

Which gives a warning message:

In grepl(x, names(unlist(authors))) :
argument 'pattern' has length > 1 and only the first element will be used.

Is there a way to bind parameter arguments together, or to bind 'family' and 'given' in the bibtex file?

I'm still a newby, any help is greatly appreciated!


Solution

  • If you want to use the full names of the authors as atoms, then you probably should convert them to strings (note that read.bib returns objects of class person), e.g.

    authors <- lapply(test, function(x) as.character(x$author))
    unique(unlist(authors))
    

    returns

    [1] "Kate Williams"  "Jeff Williams"  "Leroy Williams" "Rory Williams"