Search code examples
pythonpybliometrics

Retrieving the paper Subject Area with pybliometrics.ScopusSearch()


I am currently utilizing pybliometrics to download papers related to a specific query. Typically, when navigating the Scopus website, I apply filters based on the Subject Area of interest, such as Engineering.

Upon reviewing the documentation and executing the code, I observed that the Subject Area is not included in the results returned by the code. This discrepancy also exists during manual research on Scopus (i.e., when downloading the csv, "Subject Area" is not among the information downloaded).

I am curious whether the "(LIMIT-TO (SUBJAREA, "ENGI"))" functionality is effective when entering the query in Pybliometrics and, if not, if it is possible to retrieve it in some way.

EDIT:

An example:

  • if I run this query: "TITLE-ABS-KEY ( maintenance ) AND PUBYEAR > 1994 AND PUBYEAR < 2000 AND ( LIMIT-TO ( SRCTYPE , "j" ) ) AND ( LIMIT-TO ( SUBJAREA , "ENGI" ) ) AND ( LIMIT-TO ( DOCTYPE , "ar" ) ) AND ( LIMIT-TO ( LANGUAGE , "English" ) )" in Scopus it returns 4970 results
  • if I run the same query with pybliometrics it returns 56821 results. Here the code I use to run the research through pybliometrics:
from pybliometrics.scopus import ScopusSearch

min_year = 1995
max_year = 1999

scopus_query = f'TITLE-ABS-KEY ( maintenance ) AND PUBYEAR > {min_year-1} AND PUBYEAR < {max_year+1} AND ( LIMIT-TO ( SUBJAREA , "ENGI" ) ) AND ( LIMIT-TO ( DOCTYPE , "ar" ) ) AND ( LIMIT-TO ( SRCTYPE , "j" ) ) AND ( LIMIT-TO ( LANGUAGE , "English" ) )' 

s = ScopusSearch(scopus_query, verbose=True, download=True)

df = pd.DataFrame(s.results)

to_drop = ['eid', 'doi', 'pii', 'pubmed_id', 'subtype', 'creator', 'afid', 
           'affiliation_city', 'author_count', 'author_ids', 'author_afids',
           'coverDisplayDate', 'issn', 'source_id', 'eIssn','volume', 
           'issueIdentifier', 'article_number', 'pageRange', 'freetoread', 
           'freetoreadLabel', 'fund_acr', 'fund_no', 'fund_sponsor']

df = df.drop(columns = to_drop)
df.to_excel(f"Maintenance_{min_year}_{max_year}.xlsx", index=False)

Solution

  • To your last question, whether LIMIT-TO() works in the API: No, it doesn't. I don't know why, but I documented this in https://pybliometrics.readthedocs.io/en/stable/classes/ScopusSearch.html#documentation:

    All fields except “INDEXTERMS()” and “LIMIT-TO()” work.

    To your first question, how to get the field of documents: It's not in the Scopus Search API. Either you go via the Abstract Retrieval API (individual documents) or the Serial Title API.

    1. Get the EID, then use AbstractRetrieval(<eid>, view="FULL").subject_areas
    2. Get the ISSN, then use SerialTitle(<issn>).subject_area