I'm wanting to better monitor the memory usage of a scrapy spider as it's running (context: I'm running it on Heroku and seeing how I can restructure the scrape to stay more within the bounds of the dyno I'm using).
I've turned on the MEMUSAGE_*
settings. But for closer monitoring, I'm wondering...
Is there any way to have the spider output its current memory usage as it's running? I'm thinking of something I could use in a print
statement in the middle of the spider's code
As the spider is running, how can I see how much memory it's using? I think I see it in the output of top -o cpu
as python 3.12
, but I'm not certain it's the spider, since its memory usage doesn't seem to be accumulating as quickly as the same spider running on Heroku
Just copy the code that gets the memory usage from the MemoryUsage
extension and call it in your code:
import resource
def get_virtual_size() -> int:
size: int = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
if sys.platform != "darwin":
# on macOS ru_maxrss is in bytes, on Linux it is in KB
size *= 1024
return size