Search code examples
pythonstringseleniumweb-scrapingtags

TypeError: list indices must be integers or slices, not Tag


I have an error. when I try to concatenate the link and the part of next link, where I need to switch. Here is my error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-172-c75cfd599dcf> in <module>
     21         l.append(j['href'])
     22 
---> 23         url2 = 'https://krisha.kz/prodazha/kvartiry/petropavlovsk/' + ''.join(l[j])
     24         driver.get(url2)
     25 

TypeError: list indices must be integers or slices, not Tag

And I faced problem in the following code:

    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    links = soup.find_all('a', {'class': 'a-card__title'})
    soup2 = BeautifulSoup(str(links), 'html.parser')
    href = soup2.find_all('a', href=True)
    l = []
    for j in href:
        l.append(j['href'])
        
        url2 = 'https://krisha.kz/prodazha/kvartiry/petropavlovsk/' + ''.join(l[j])
        driver.get(url2)
        

My "l" is a list of hrefs and it looks like that picture below: enter image description here

Because of that I can't move to the next page to scrape it. What integer or slice do I need here?


Solution

  • Instead of using the list try to use j of your loop:

    url2 = 'https://krisha.kz/prodazha/kvartiry/petropavlovsk/' +j['href'][1:]
    

    I sliced it at the end to avoid a // in the url.

    You also can use the list but than you have to enumerate in your loop:

    for i,j in enumerate(href):
        l.append(j['href'])
        url2 = 'https://krisha.kz/prodazha/kvartiry/petropavlovsk/' +l[i][1:]
    

    Example

    from bs4 import BeautifulSoup
    import requests
    import pandas as pd
    url = "https://krisha.kz/prodazha/kvartiry/petropavlovsk/"
    
    
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    links = soup.find_all('a', {'class': 'a-card__title'})
    soup2 = BeautifulSoup(str(links), 'html.parser')
    href = soup2.find_all('a', href=True)
    l = []
    for j in href:
        l.append(j['href'])
    
        url2 = 'https://krisha.kz/prodazha/kvartiry/petropavlovsk/' +j['href'][1:]
        print(url2)