Search code examples
pythonpandasdictionarypubmed

Query PubMed with Python - How to get all article details from query to Pandas DataFrame and export them in CSV


How can I get all article details from query on PubMed to Pandas DataFrame and export them all into CSV.

I need following article details:

pubmed_id, title, keywords, journal, abstract, conclusions,methods, results, copyrights, doi, publication_date, authors


Solution

  • Here is how I did it. It's fully functional code, all you need to do is install pymed with pip install pymed . Function is here:

    from pymed import PubMed
    pubmed = PubMed(tool="PubMedSearcher", email="myemail@ccc.com")
    
    ## PUT YOUR SEARCH TERM HERE ##
    search_term = "Your search term"
    results = pubmed.query(search_term, max_results=500)
    articleList = []
    articleInfo = []
    
    for article in results:
    # Print the type of object we've found (can be either PubMedBookArticle or PubMedArticle).
    # We need to convert it to dictionary with available function
        articleDict = article.toDict()
        articleList.append(articleDict)
    
    # Generate list of dict records which will hold all article details that could be fetch from PUBMED API
    for article in articleList:
    #Sometimes article['pubmed_id'] contains list separated with comma - take first pubmedId in that list - thats article pubmedId
        pubmedId = article['pubmed_id'].partition('\n')[0]
        # Append article info to dictionary 
        articleInfo.append({u'pubmed_id':pubmedId,
                           u'title':article['title'],
                           u'keywords':article['keywords'],
                           u'journal':article['journal'],
                           u'abstract':article['abstract'],
                           u'conclusions':article['conclusions'],
                           u'methods':article['methods'],
                           u'results': article['results'],
                           u'copyrights':article['copyrights'],
                           u'doi':article['doi'],
                           u'publication_date':article['publication_date'], 
                           u'authors':article['authors']})
    
    # Generate Pandas DataFrame from list of dictionaries
    articlesPD = pd.DataFrame.from_dict(articleInfo)
    export_csv = df.to_csv (r'C:\Users\YourUsernam\Desktop\export_dataframe.csv', index = None, header=True) 
    
    #Print first 10 rows of dataframe
    print(articlesPD.head(10))