Search code examples
pythonbeautifulsoupscraper

I use scaper script and retrofit it with info from a medical site but it doesn't seem to create the csv file I'm looking to create


I have used this script I found for a scraper tool that gathers the names of all the dentists listed on the web page. When I run it, no new csv file is created with the aggregated data I'm looking for.Here's the script:

from bs4 import BeautifulSoup as bs
import requests as rq
import csv

url = "https://www.healthgrades.com/usearch?what=Dentistry&where=Canal%20Street%2C%20NY%2010013&city=Canal%20Street&state=NY&pt=40.720901%2C%20-74.008904&zip=10013&neCorner=40.739420717131885%2C-73.98771539161403&swCorner=40.70233998462754%2C-74.03007248950355&mapCenter=40.720901%2C-74.008904&zoomLevel=14.6&mapChanged=false&pageNum=2"
GeT = rq.get(url)
soup = bs(GeT.content, "html.parser")

data_1 = soup.find_all ('div',{'class':'card-content__details'})

doctors_list = []

for item in data_1:
    try:
        first = item.contents[2].find_all('div',{'class':'details'})[1].text
    except:
        first = ''


    doctors_list.append(first)  

    with open('newfile.csv','w') as file:
        writer = csv.writer(file)
        for row in doctors_list:
            writer.writerow(row)

Solution

  • Example JSON response: https://bpaste.net/show/fcb53d9bc16f

    >>> import requests
    ... 
    ... BASE_URL = 'https://www.healthgrades.com/api3/usearch'
    ... 
    ... params = {
    ...     'userLocalTime':'22:37',
    ...     'what':'Dentistry',
    ...     'where':'Canal Street, NY 10013',
    ...     'pt':'40.720901, -74.008904',
    ...     'sort.provider':'bestmatch',
    ...     'category':'provider',
    ...     'sessionId':'Sb93293f932c6bc56',
    ...     'requestId':'Rac7ffe6e6256eba3',
    ...     'pageSize.provider':'20',
    ...     'pageNum':'2',
    ...     'isFirstRequest':'true',
    ...     'debug':'false',
    ...     'isAtlas':'true',
    ...     'action':'refresh',
    ...     'neCorner':'40.744526282819244,-73.99060337556104',
    ...     'swCorner':'40.69723118200452,-74.02724113269275'
    ... }
    ... r = requests.get(BASE_URL, params=params)
    ... r.raise_for_status()
    >>> dentists = r.json()['search']['searchResults']['provider']['results']
    >>> for dentist in dentists:
    ...     print(dentist['displayName'])
    ... 
    Dr. Raphael Santore, DDS
    Dr. Gain Lu, DDS
    Dr. Molly Lim, DDS
    Dr. Anne Yu, DDS
    Dr. Charmaine Ip, DMD
    Dr. Devi Konar, DDS
    Dr. Christopher Perez, DMD
    Dr. Lee Gold, DDS
    Dr. Elaine Wong, DDS
    Dr. Fan Mou, DDS
    Dr. Henry Wong, DDS
    Dr. Shauna Fung, DDS
    Dr. Emilie Fong, DDS
    Dr. Nancy Ma, DDS
    Dr. Charles Tiu, DDS
    Dr. Glenn Chiarello, DDS
    Dr. John Nosti, DMD
    Dr. Loi Chan, DDS
    Dr. Charles Hashim, DDS
    Dr. David Azar, DDS
    Dr. Jenny Zhu, DDS
    Dr. Stanton Young, DMD
    Dr. Pankaj Singh, DDS
    Dr. Lawrence Tam, DDS
    Dr. Alina Lukashevsky, DDS
    Dr. Maureen Khoo, DDS
    Dr. Mailin Lai, DDS
    Dr. Stewart Neidle, DDS
    Danielle Danzi, DDM
    Dr. Justin Cohen, DMD
    Dr. Weihsin Men, DMD
    Dr. Anthony Kail, DDS
    Sima Epstein
    Dr. Christian Bilius, DDS
    Dr. Jeffrey Shapiro, DDS
    Dr. Donald Ingerman, DDS
    
    >>> featured_dentists = r.json()['featuredProviders']
    >>> for dentist in featured_dentists:
    ...     print(dentist['displayName'])
    ... 
    Dr. Ora Canter, DDS
    Dr. Alfred Shirzadnia, DDS
    Dr. Henry Nogid, DDS