Search code examples
pythonimagebeautifulsoupscreen-scrapingcraigslist

Scraping Craiglist with BeautifulSoup and getting first image in each posting


I am currently trying to scrape aviation data from craigslist. I have no problem getting all the info I want except the first image for each post. Here is my link:

https://spokane.craigslist.org/search/avo?hasPic=1

I have been able to get all images thanks to a different post on this site but I am having trouble figuring out how to get just the first image.

I am using bs4 and requests for this script. Here is what I have so far which gets every image:

from bs4 import BeautifulSoup as bs
import requests

image_url = 'https://images.craigslist.org/{}_300x300.jpg'
r = requests.get('https://spokane.craigslist.org/search/avo?hasPic=1')
soup = bs(r.content, 'lxml')
ids = [item['data-ids'].replace('1:','') for item in soup.select('.result-image[data-ids]', limit = 10)] 
images = [image_url.format(j) for i in ids for j in i.split(',')]
print(images)

Any help is greatly appreciated.

Thanks in advance,

inzel


Solution

  • You need to find all class with the gallery of images then get the data-ids. Then split them into a list and get the first element [0].

    from bs4 import BeautifulSoup as bs
    import requests
    
    image_url = 'https://images.craigslist.org/{}_300x300.jpg'
    r = requests.get('https://spokane.craigslist.org/search/avo?hasPic=1')
    soup = bs(r.content, 'lxml')
    ids = [item.get('data-ids').replace('1:','') for item in soup.findAll("a", {"class": "result-image gallery"}, limit=10)] 
    images = [image_url.format(i.split(',')[0]) for i in ids]
    print(images)
    

    Result:

    ['https://images.craigslist.org/00N0N_ci3cbcv5T58_300x300.jpg', 'https://images.craigslist.org/00101_5dLpBXXdDWJ_300x300.jpg', 'https://images.craigslist.org/00n0n_8zVXHONPkTH_300x300.jpg', 'https://images.craigslist.org/00l0l_jiNMe38avtl_300x300.jpg', 'https://images.craigslist.org/01212_fULyvfO9Rqz_300x300.jpg', 'https://images.craigslist.org/00D0D_ibbWWn7uFCu_300x300.jpg', 'https://images.craigslist.org/00z0z_2ylVbmdVnPr_300x300.jpg', 'https://images.craigslist.org/00Q0Q_ha0o2IJwj4Q_300x300.jpg', 'https://images.craigslist.org/01212_5LoZU43xA7r_300x300.jpg', 'https://images.craigslist.org/00U0U_7CMAu8vAhDi_300x300.jpg']