Search code examples
web-scrapingesp32micropython

How to make requests on microPython more efficient?


I´m currently programming an esp32 with micropython. It runs a bigger project for university and I have a problem with the following code.

import urequests as request

def readWebsiteData():
        response = request.get(url='https://www.heizpellets24.de')
        response_text = response.text

        index1 = response_text.find("fr chartPrice")
        index2 = response_text.find("fr small")

        offset1 = 15
        offset2 = 10
        len1 = 6
        len2 = 5
        
        price = response_text[(index1+offset1):(index1+offset1+len1)]
        trend = response_text[(index2+offset2):(index2+offset2+len2)]
        response.close()

        return [price, trend]

Im trying to read two numbers from a website to include them in my program. The code itself runs fine, however if I include it in the whole program I get a memory error in line 5. (memory allocating failed 20000bytes)

Is there a way to not read the whole html response (over 1000 lines of code)?

Btw I know the code with the offset for reading the numbers isn't very elegant and it doesn't work if the website makes even small changes.

I already tried running gc.collect() before calling the function.

response_text = response.text[543:579] also didn't help (I´m guessing it just safes the whole html response anyway?)

I displayed the memory before calling the function:

  • memory free: 89952
  • memory allocated: 21216

Solution

  • You're trying to load the content of a large web page into the memory of a resource-limited embedded device.

    Your best option is probably to avoid loading the content into memory all at once, which is what happens when you run:

    response_text = response.text
    

    If you look at the implementation of the .text attribute, it loads all the response content into memory:

    @property
    def content(self):
        if self._cached is None:
            try:
                self._cached = self.raw.read()
            finally:
                self.raw.close()
                self.raw = None
        return self._cached
    
    @property
    def text(self):
        return str(self.content, self.encoding)
    

    You could instead iterate over the response content in smaller blocks; something like:

    import urequests as request
    
    def readWebsiteData():
            response = request.get(url='https://www.heizpellets24.de')
    
            # Here I'm using a 2K buffer but you could make this substantially
            # larger if you want (16k/32k/etc)
            buf = bytearray(2048)
    
            price = ''
            price_offset = 15
            price_len = 6
    
            trend = ''
            trend_offset = 10
            trend_len = 5
    
            while True:
                numbytes = response.raw.readinto(buf)
                index = buf.find(b'fr chartPrice')
                if index != -1:
                    price = buf[index + price_offset:index + price_offset + price_len].decode()
    
                index = buf.find(b'fr small')
                if index != -1:
                    trend = buf[index + trend_offset:index + trend_offset + trend_len].decode()
    
                if numbytes < len(buf):
                    break
    
            response.raw.close()
            return price, trend
    
    price, trend = readWebsiteData()
    print('price:', price, 'trend:', trend)
    

    Running the above code on my esp32 results in:

    price: 390,20 trend: +0,05
    

    In this code we're optimistically assuming that the content we want isn't split across blocks. We're fortunate it works without additional logic in this specific case.