Search code examples
rdplyrtext-miningpubmed

How to add list items to a new dataframe column inside another list in r?


I'm trying to extract co-author's names and affiliations for all publications on pubmed. I was able to get the list of author's name in a dataframe, but I now need to add the affiliation with the name. I've been trying to do this, but I'm not sure how.

I need to combine two lists: authors and affiliations for each authors into one.

query = "an author's name"

res <- EUtilsSummary(query, db="pubmed", mindate=2015, maxdate=2019)
QueryCount(res)

auths <- Author(EUtilsGet(res))
affs <- Affiliation(EUtilsGet(res))

Last<-sapply(auths, function(x)paste(x$LastName, x$ForeName, sep = ", "))
auths2<-as.data.frame(sort(table(unlist(Last)), dec=TRUE))
names(auths2)<-c("name")
auths2

I'm using RISmed to extract the data. I want the data in the following format:

Lastname, Firstname Affiliation

I don't care about the count.

I suppose the other way to look at this is the following:

Combine two lists together.

list A is a list of dataframe: There are multiple items in this list where each item has the following

LastName   ForeName   Initials
A          B          AB
C          D          CD

list B is a list of lists:

Affiliations:
"X university"
"Y University"

What I want to do is to combine these two lists together such that the affiliations show up for each authors as a column on the dataframe. The final result would be the following:

LastName   ForeName   Initials   Affiliations
A          B          AB         "X University"
C          D          CD         "Y University"

Solution

  • Since some of the queries can return NA values for authors and zero length vectors for affiliations I made a small function that only cbind() the values if both lists entries are correct:

    special_cbind = function(authors,affiliations){
      if(length(affiliations) == 0 | all(is.na(authors)) ){
        authors
      }
      else if(nrow(authors) == length(affiliations)){
        cbind(authors,affiliations)
      }
      else{
        affiliations = rep(affiliations,nrow(authors))
        cbind(authors,affiliations)
      }
    
    }
    

    Then apply it to every list entry with Map.

    Map(special_cbind,auths,affs)
    

    Does this work for your data?