Search code examples
pythonscopuspybliometrics

Scopus Abstract Retrieval - Value and Type Error only when too many entries are parsed


I am trying to retrieve abstracts via Scopus Abstract Retrieval. I have a file with 3590 EIDs.

import pandas as pd
import numpy as np

file = pd.read_excel(r'C:\Users\Amanda\Desktop\Superset.xlsx', sheet_name='Sheet1')

from pybliometrics.scopus import AbstractRetrieval
for i, row in file.iterrows():
  q = row['EID']
  ab = AbstractRetrieval(q,view='META_ABS')
  file.at[i,"Abstract"] = ab.description
  print(str(i) + ' ' + ab.description)
  print(str(''))

I get a value error - Value Error

In response to the value error, I altered the code.

from pybliometrics.scopus import AbstractRetrieval
error_index_valueerror = {}

    for i, row in file.iterrows():
      q = row['EID']
      try:
        ab = AbstractRetrieval(q,view='META_ABS')
        file.at[i,"Abstract"] = ab.description
        print(str(i) + ' ' + ab.description)
        print(str(''))
      except ValueError:
        print(f"{i} Value Error")
        error_index_valueerror[i] = row['Title']
        continue

When I trialed this code with 10-15 entries, it worked well and I retrieved all the abstracts. However, when I ran the actual file with 3590 EIDs, the output would be a series of 10-12 value errors before a type error ('can only concatenate str (not "NoneType") to str surfaces.

Series of value errors and a type error

I am not sure how to tackle this problem moving forward. Any advice on this matter would be greatly appreciated!

(Side note: When I change view='FULL' (as recommended by the documentation), I still get the same outcome.)


Solution

  • Without EIDs to check, it is tough to point to the precise cause. However, I'm 99% certain that your problem are missing abstracts in the .description property. It's sufficient when the first call is empty, because it will turn the column type into float, to which you wish to append a string. That's what the error says.

    Thus your problem has nothing to do with pybliometrics or Scopus, but with the way you bild the code.

    Try this instead:

    import pandas as pd
    import numpy as np
    from pybliometrics.scopus import AbstractRetrieval
    
    def parse_abstract(eid):
        """Retrieve Abstract of a document."""
        ab = AbstractRetrieval(q, view='META_ABS')
        return ab.description or ab.abstract
    
    
    FNAME = r'C:\Users\Amanda\Desktop\Superset.xlsx'
    df = pd.read_excel(FNAME, sheet_name='Sheet1')
    df["abstract"] = df["EID"].apply(parse_abstract)
    

    Instead of appending values one-by-one in a loop, which is slow and error-prone, I use pandas' .apply() methods.

    Also note how I write ab.description or ab.abstract. https://pybliometrics.readthedocs.io/en/stable/classes/AbstractRetrieval.html states that both should yield the same but can be empty. With this statement, if ab.description is empty (i.e., falsy), it will use ab.abstract instead.