Search code examples
pythonmultithreadingmultiprocess

python multi threading/ multiprocess code


In the code below, I am considering using mutli-threading or multi-process for fetching from url. I think pools would be ideal, Can anyone help suggest solution..

Idea: pool thread/process, collect data... my preference is process over thread, but not sure.

import urllib

URL = "http://download.finance.yahoo.com/d/quotes.csv?s=%s&f=sl1t1v&e=.csv"
symbols = ('GGP', 'JPM', 'AIG', 'AMZN','GGP', 'JPM', 'AIG', 'AMZN')
#symbols = ('GGP')

def fetch_quote(symbols):
    url = URL % '+'.join(symbols)
    fp = urllib.urlopen(url)
    try:
        data = fp.read()
    finally:
        fp.close()
    return data

def main():
    data_fp = fetch_quote(symbols)
#    print data_fp
if __name__ =='__main__':
    main()

Solution

  • You have a process that request, several information at once. Let's try to fetch these information one by one.. Your code will be :

    def fetch_quote(symbols):
        url = URL % '+'.join(symbols)
        fp = urllib.urlopen(url)
        try:
            data = fp.read()
        finally:
            fp.close()
        return data
    
    def main():
        for symbol in symbols:
            data_fp = fetch_quote((symbol,))
            print data_fp
    
    if __name__ == "__main__":
        main()
    

    So main() call, one by one every url to get the data. Let's multiprocess it with a pool:

    import urllib
    from multiprocessing import Pool
    
    URL = "http://download.finance.yahoo.com/d/quotes.csv?s=%s&f=sl1t1v&e=.csv"
    symbols = ('GGP', 'JPM', 'AIG', 'AMZN','GGP', 'JPM', 'AIG', 'AMZN')
    
    def fetch_quote(symbols):
        url = URL % '+'.join(symbols)
        fp = urllib.urlopen(url)
        try:
            data = fp.read()
        finally:
            fp.close()
        return data
    
    def main():
        for symbol in symbols:
            data_fp = fetch_quote((symbol,))
            print data_fp
    
    if __name__ =='__main__':
        pool = Pool(processes=5)
        for symbol in symbols:
            result = pool.apply_async(fetch_quote, [(symbol,)])
            print result.get(timeout=1)
    

    In the following main a new process is created to request each symbols urls.

    Note: on python, since the GIL is present, multithreading must be mostly considered as a wrong solution.

    For documentation see: Multiprocessing in python