Search code examples
pythonweb-scrapingpython-multiprocessingpython-multithreadinggil

Will GIL interfere if I run my python code with multiprocessing in it on different virual machine on the same pc?


I currently have a python scripts that scrapes data from a single url.

In order to speed up the process I'm using the pool multiprocessing module in the script, this script is called "script_one.py" for the sake of explanation.

The script it exclusively does a "get request" to collect the json/html resuls from the target url and constantly switches proxy address, and saves the results on a text file.

My question is: If I run the same code (script_one.py) on multiple virtual machine, will I further speed up the process without incurring into any issue with GIL?

Here below is my code:

import requests,time,random
from multiprocessing import Pool


def script_one(file_name,from_letter,to_letter):
    print('Here it does the get request and collects data')
    print('Here it saves on file')



if __name__ == '__main__':
    with Pool(5) as p:
        print(p.starmap(script_one,[('r_ba', 'r', 'rba'),('rbrca', 'rb', 'rca'),('rcrda', 'rc', 'rda'),
                                 ('rdrea', 'rd', 'rea'),('rerfa', 're', 'rfa'),('rfrga', 'rf', 'rga'),
                                 ('rgrha', 'rg', 'rha'),('rhria', 'rh', 'ria'),('rirja', 'ri', 'rja'),
                                 ('rjrka', 'rj', 'rka'),('rkrla', 'rk', 'rla'),('rlrma', 'rl', 'rma'),
                                 ('rmrna', 'rm', 'rna'),('rnroa', 'rn', 'roa'),('rorpa', 'ro', 'rpa'),
                                 ('rprqa', 'rp', 'rqa'),('rqrra', 'rq', 'rra'),('rrrsa', 'rr', 'rsa'),
                                 ('rsrta', 'rs', 'rta'),('rtrua', 'rt', 'rua'),('rurva', 'ru', 'rva'),
                                 ('rvrwa', 'rv', 'rwa'),('rwrxa', 'rw', 'rxa'),('rxrya', 'rx', 'rya'),
                                 ('ryrza', 'ry', 'rza'),('rzr0a', 'rz', 'r0a')]))

         p.close()
         p.join()

Solution

  • Currently there are multiple options available: - Multi-processing - Multi-threading - Using multiple virtual machine in parallel - For windows user, I also found a good way to use multiple desktop (I'm guessing should work the same for linux users) - Also you can manually run multiple terminals windows at the same time, as (Credit: @MatteoItalia) during request, while waiting for socket GIL gets released.

    Credit: @MatteoItalia