Search code examples
pythonmemorymultiprocessingshapefilepsutil

PBS vmem exceeded limit: How can I know where the memory exceeds?


I have a shapefile with 1500000 polygons, I need to go to each polygon and intersect it with different grids.

I created a simple program that goes from polygon to polygon for the intersection (with multiprocessing),

pool = mp.Pool() 
for index,pol in shapefile.iterrows():
        # Limits each polygon in shapefile     
        ylat = lat_gridlimits
        xlon= lon_gridlimits
        args.append((dgrid,ylat,xlon,pol,index))      
pool.starmap(calculate,args) 
pool.close() 
pool.join()

but memory fills up very quickly and I get an error

PBS: job killed: vmem exceeded limit

How can I know where or when the memory exceeds? or is there a way to control the memory in each function?

I tried this (inside calculate):

process = psutil.Process(os.getpid())
mem=process.memory_info().rss/(1024.0 ** 3)
vmem=psutil.virtual_memory().total / (1024.0 ** 3)
print("{}  {}\n".format(mem,vmem)) 

but it doesn't help me locate where


Solution

  • One reason you are running out of memory might be because you are using an iterator in your for loop to iterate over a very large dataset. Iterating over this set may take more memory than the python program is allowed to use on your system. One way to save memory is to rewrite the shapefile.iterrows(), which is an iterator, into a function that returns a generator, since generators calculate the new index that needs to be read rather than storing all indexes.

    To read more about generators visit following link:

    https://pythongeeks.org/python-generators-with-examples/