Search code examples
pythonbiopythonpubmed

Biopython pubmed lookup - "No connection could be made because the target machine actively refused it" error 10061


I'm trying to retrieve ids of a specific set of keywords from pubmed using the following standard code,

import os
from Bio import Entrez
from Bio import Medline

#Defining keyword file
keywords_file = "D:\keywords.txt"

# Splitting keyword file into a list
keyword_list = []
keyword_list = open(keywords_file).read().split(',')
#print keyword_list

# Creating folders by keywords and creating a text file of the same keyword in each   folder
for item in keyword_list:
    create_file = item +'.txt.'
    path = r"D:\Thesis"+'\\'+item
    #print path
    if not os.path.exists(path): 
        os.makedirs(path)
    #print os.getcwd()
    os.chdir(path)
    f = open(item+'.txt','a')
    f.close()

# Using biopython to fetch ids of the keyword searches
limit = 10

def fetch_ids(keyword,limit):
    for item in keyword:
        print item
        print "Fetching search for "+item+"\n"
        #os.environ['http_proxy'] = '127.0.0.1:13828'
        Entrez.email = 'A.N.Other@example.com'
        search = Entrez.esearch(db='pubmed',retmax=limit,term = '"'+item+'"')
        print term
        result = Entrez.read(search)
        ids = result['IdList']
        #print len(ids)
        return ids

print fetch_ids(keyword_list,limit)

id_res = fetch_ids(keyword_list,limit)
print id_res

def write_ids_in_file(id_res):
    with open(item+'.txt','w') as temp_file:
        temp_file.write('\n'.join(ids))
        temp_file.close()
write_ids_in_file(id_res)

In a nutshell what I'm trying to do is to create folders with the same name as each of the keywords, create a text file within the folder, fetch the ids from pubmed through the code and save the ids in the text files. My program worked fine when I initially tested it, however, after a couple of tries it started throwing me the target machine actively refused connection error. Some more details that could be useful,

header information

  • Host = 'eutils.ncbi.nlm.nih.gov'
  • Connection = 'close'
  • User-Agent = 'Python-urllib/2.7'

URL

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=%22inflammasome%22&retmax=10&db=pubmed&tool=biopython&email=A.N.Other%40example.com

host = '127.0.0.1:13828'

I know that this question has been asked many times with the response that the port is not listening but what I want to know is if this is my issue as well, then how do get the application to work on this specific port. I've already gone to my firewall settings and opened a port 13828 but I'm not sure what to do beyond this. If this is not the case, what could potentially be a work around solution?

Thanks!


Solution

  • You need search.close() after result = Entrez.read(search). Check official instruction here. http://biopython.org/DIST/docs/api/Bio.Entrez-module.html


    Shut down port or TCP due to too many open connections is a normal behavior for a public website.