I am trying to retrieve abstracts via Scopus Abstract Retrieval. I have a file with 3590 EIDs.
import pandas as pd
import numpy as np
file = pd.read_excel(r'C:\Users\Amanda\Desktop\Superset.xlsx', sheet_name='Sheet1')
from pybliometrics.scopus import AbstractRetrieval
for i, row in file.iterrows():
q = row['EID']
ab = AbstractRetrieval(q,view='META_ABS')
file.at[i,"Abstract"] = ab.description
print(str(i) + ' ' + ab.description)
print(str(''))
In response to the value error, I altered the code.
from pybliometrics.scopus import AbstractRetrieval
error_index_valueerror = {}
for i, row in file.iterrows():
q = row['EID']
try:
ab = AbstractRetrieval(q,view='META_ABS')
file.at[i,"Abstract"] = ab.description
print(str(i) + ' ' + ab.description)
print(str(''))
except ValueError:
print(f"{i} Value Error")
error_index_valueerror[i] = row['Title']
continue
When I trialed this code with 10-15 entries, it worked well and I retrieved all the abstracts. However, when I ran the actual file with 3590 EIDs, the output would be a series of 10-12 value errors before a type error ('can only concatenate str (not "NoneType") to str surfaces.
I am not sure how to tackle this problem moving forward. Any advice on this matter would be greatly appreciated!
(Side note: When I change view='FULL' (as recommended by the documentation), I still get the same outcome.)
Without EIDs to check, it is tough to point to the precise cause. However, I'm 99% certain that your problem are missing abstracts in the .description
property. It's sufficient when the first call is empty, because it will turn the column type into float
, to which you wish to append a string. That's what the error says.
Thus your problem has nothing to do with pybliometrics or Scopus, but with the way you bild the code.
Try this instead:
import pandas as pd
import numpy as np
from pybliometrics.scopus import AbstractRetrieval
def parse_abstract(eid):
"""Retrieve Abstract of a document."""
ab = AbstractRetrieval(q, view='META_ABS')
return ab.description or ab.abstract
FNAME = r'C:\Users\Amanda\Desktop\Superset.xlsx'
df = pd.read_excel(FNAME, sheet_name='Sheet1')
df["abstract"] = df["EID"].apply(parse_abstract)
Instead of appending values one-by-one in a loop, which is slow and error-prone, I use pandas' .apply()
methods.
Also note how I write ab.description or ab.abstract
. https://pybliometrics.readthedocs.io/en/stable/classes/AbstractRetrieval.html states that both should yield the same but can be empty. With this statement, if ab.description
is empty (i.e., falsy), it will use ab.abstract
instead.