Search code examples
pythonscopuspybliometrics

Get affiliation information from multiple authors in a loop


Currently working with pybliometrics (scopus) I want to create a loop that allows me to get affiliation information from multiple authors.

Basically, this is the idea of my loop. How do I do that with many authors?

from pybliometrics.scopus import AuthorRetrieval
import pandas as pd
import numpy as np  

au = AuthorRetrieval(authorid)
au.affiliation_history
au.identifier
x = au.identifier

refs2 = au.affiliation_history
len(refs2)
refs2
df = pd.DataFrame(refs2)
df.columns
a_history = df
df['authorid'] = x

#moving authorid to 0
cols = list(df)
cols.insert(0, cols.pop(cols.index('authorid')))
df = df.loc[:, cols]

df.to_excel("af_historyfinal.xlsx")

Solution

  • Turning your code into a loop over multiple author IDs? Nothing easier than that. Let's say AUTHOR_IDS equals 7004212771 and 57209617104:

    import pandas as pd  
    from pybliometrics.scopus import AuthorRetrieval
    
    def retrieve_affiliations(auth_id):
        """Author's affiliation history from Scopus as DataFrame."""
        au = AuthorRetrieval(authorid)
        df = pd.DataFrame(au.affiliation_history)
        df["auth_id"] = au.identifier
        return df
    
    AUTHOR_IDS = [7004212771, 57209617104]
    
    # Option 1, for few IDs
    df = pd.concat([retrieve_affiliations(a) for a in AUTHOR_IDS])
    
    # Option 2, for many IDs
    df = pd.DataFrame():
    for a in AUTHOR_IDS:
        df = df.append(retrieve_affiliations(a))
    
    # Have author ID as first column
    df = df.set_index("authorid").reset_index()    
    df.to_excel("af_historyfinal.xlsx", index=False)
    

    If, say, your IDs are in a comma-separated file called "input.csv", with one column called "authors", then you start with

    AUTHOR_IDS = pd.read_csv("input.csv")["authors"].unique()