I have a shapefile with 1500000 polygons, I need to go to each polygon and intersect it with different grids.
I created a simple program that goes from polygon to polygon for the intersection (with multiprocessing),
pool = mp.Pool()
for index,pol in shapefile.iterrows():
# Limits each polygon in shapefile
ylat = lat_gridlimits
xlon= lon_gridlimits
args.append((dgrid,ylat,xlon,pol,index))
pool.starmap(calculate,args)
pool.close()
pool.join()
but memory fills up very quickly and I get an error
PBS: job killed: vmem exceeded limit
How can I know where or when the memory exceeds? or is there a way to control the memory in each function?
I tried this (inside calculate):
process = psutil.Process(os.getpid())
mem=process.memory_info().rss/(1024.0 ** 3)
vmem=psutil.virtual_memory().total / (1024.0 ** 3)
print("{} {}\n".format(mem,vmem))
but it doesn't help me locate where
One reason you are running out of memory might be because you are using an iterator in your for loop to iterate over a very large dataset. Iterating over this set may take more memory than the python program is allowed to use on your system. One way to save memory is to rewrite the shapefile.iterrows(), which is an iterator, into a function that returns a generator, since generators calculate the new index that needs to be read rather than storing all indexes.
To read more about generators visit following link: