Search code examples
pythonjsonscrapyweb-crawler

Json decoded with python


(Scrapy)I need help with the next code:

def parse_item(self, response):
        ml_item = MercadoItem()
        #info de producto
        ml_item['nombre'] = response.xpath('//h1[@class="title"]/text()').extract()
        ml_item['web'] = response.xpath('/html/body/div[1]/div/div/div[1]/main/div/div[1]/div[2]/div[1]/div/div[4]/a/@href').extract()
        script_data = response.xpath('string(/html/head/script[3]/text()').extract()
        decoded_data = json.loads(script_data)
        ml_item['datos'] = decoded_data["telephone"]
        ml_item['direccion'] = response.xpath('/html/body/div[1]/div/div/div[1]/main/div/div[1]/div[2]/div[1]/div/span[2]/text()').extract()
        self.item_count += 1
        if self.item_count > 5:
            raise CloseSpider('item_exceeded')
        yield ml_item

I use decoded Json for obtain phone number only but the console return me a error script_data contains the script

File "/mercadolibre-scrapy-master/mercado/spiders/spiderq.py", line 88 ml_item['direccion'] = response.xpath('/html/body/div[1]/div/div/div[1]/main/div/div[1]/div[2]/div[1]/div/span[2]/text()').extract()

^ IndentationError: unexpected indent

The script is:

{"@context":"http://schema.org","@type":"LocalBusiness","name":"Clínica Dental Castellana 23","description":".TU CLÍNICA DENTAL DE REFERENCIA EN MADRID","telephone":"+34912298837","address":{"@type":"PostalAddress","streetAddress":"Castellana 23","addressLocality":"MADRID","addressRegion":"Madrid","postalCode":"28003"}}

Solution

  • Check the line reported by the error, the indentation used to align that line is different from the indentation used for the previous one, for example you may have 4 spaces before and 1 tab after, the may look the same, but they are different for the Python interpreter.