Search code examples
scrapy

Seeing / outputting memory usage of scrapy spider as it's running


I'm wanting to better monitor the memory usage of a scrapy spider as it's running (context: I'm running it on Heroku and seeing how I can restructure the scrape to stay more within the bounds of the dyno I'm using).

I've turned on the MEMUSAGE_* settings. But for closer monitoring, I'm wondering...

  1. Is there any way to have the spider output its current memory usage as it's running? I'm thinking of something I could use in a print statement in the middle of the spider's code

  2. As the spider is running, how can I see how much memory it's using? I think I see it in the output of top -o cpu as python 3.12, but I'm not certain it's the spider, since its memory usage doesn't seem to be accumulating as quickly as the same spider running on Heroku


Solution

  • Just copy the code that gets the memory usage from the MemoryUsage extension and call it in your code:

    import resource
    
    def get_virtual_size() -> int:
        size: int = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
        if sys.platform != "darwin":
            # on macOS ru_maxrss is in bytes, on Linux it is in KB
            size *= 1024
        return size