Search code examples
pythonbeautifulsouppython-requestspython-requests-htmlhttp-status-code-406

How to access a website that gives UnAcceptable error message: 406?


After searching 100s of answers, I'm here again, asking new question that might help someone in the future.

I'm scraping this website: https://inview.doe.in.gov/state/1088000000/school-list.
The school list is in a flex box and I believe that I can get the data fetched by using selenium. But I want get this job done only by using BeautifulSoup.

By inspecting and tracking the Network connections, I found 2 API calls and I'm not which API gives me the school list. I do have their IPv4 address as well.

api = 'https://inview.doe.in.gov/api/entities?lang=en&merges=[{"route": "entities", "name": "district", "local_field": "district_id", "foreign_field": "id", "fields": "id,name"}]&filter=state_id==1088000000'
ipv4 = '104.18.21.238:443'
api2 = 'https://inview.doe.in.gov/api/entities?filter=type==district,type==network,type==school,type==state&fields=name,type,id,district_id'
ipv4 = '104.18.21.238:443'

Trying to access the content directly gives None as it is dynamaically loaded (at least that's what I believe).

import json
import requests
from bs4 import BeautifulSoup


def url_parser(url):
  html_doc = requests.get(url, headers={"Accept":"*/*"}).text
  soup = BeautifulSoup(html_doc,'html.parser')
  return html_doc, soup


def data_fetch(url):
  html_doc, soup = url_parser(url)
  api_link = 'https://inview.doe.in.gov/api/entities?lang=en&merges=[{"route": "entities", "name": "district", "local_field": "district_id", "foreign_field": "id", "fields": "id,name"}]&filter=state_id==1088000000'
  html_doc2, soup2 = url_parser(api_link)
  #school_id = soup2.find_all('div', {'class':'result-table table--results mt-3'})
  print(soup2)


def main():
  url = "https://inview.doe.in.gov/state/1088000000/school-list"
  data_fetch(url)

main()

Trying to open the api link directly gives me the same error message as what I get in the code as below:

{"message":"The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request. Supported entities are: application/json, application/vnd.tembo.api+json, application/vnd.tembo.api+json;version=1","status":406}

Is there any way I can fix that?


Solution

  • for example:

    import requests
    import pandas as pd
    
    url = "https://inview.doe.in.gov/api/entities?lang=en&merges=[{%22route%22:%20%22entities%22,%20%22name%22:%20%22district%22,%20%22local_field%22:%20%22district_id%22,%20%22foreign_field%22:%20%22id%22,%20%22fields%22:%20%22id,name%22}]&filter=state_id==1088000000"
    headers = {
      'accept': 'application/vnd.tembo.api+json',
    }
    schools = []
    response = requests.request("GET", url, headers=headers)
    for school in response.json()['entities']:
        schools.append({
            'ID': school['id'],
            'Name': school['name'],
            'Type': school['type'],
            'Grades': (lambda grade: ' - '.join([grade['grades'][0]['name'], grade['grades'][-1]['name']]) if 'grades' in grade else 'NA')(school),
            'Phone': (lambda phone: phone['phone_number'] if 'phone_number' in phone else 'NA')(school),
        })
    df = pd.DataFrame(schools)
    print(df.to_string(index=False))
    

    OUTPUT:

            ID                                                                Name     Type                      Grades          Phone
    1053105210                                 Edgewood Intermediate School (5210)   school           Grade 4 - Grade 6 (317) 803-5024
    1053105317                              Wanamaker Early Learning Center (5317)   school               Pre-K - Pre-K (317) 860-4500
    1045353742                              Wolcott Mills Elementary School (3742)   school               Pre-K - Pre-K (260) 499-2450
    1045353746                                     Lima-Brighton Elementary (3746)   school               Pre-K - Pre-K (260) 499-2440
    1033352672                                      Little Cadets Preschool (2672)   school               Pre-K - Pre-K (000) 000-0000
    1014051133                                           Washington Primary (1133)   school             Pre-K - Grade 1 (812) 254-8360
    1018751365                                   Royerton Elementary School (1365)   school      Kindergarten - Grade 5 (765) 282-2044
    1018751367                                          Delta Middle School (1367)   school           Grade 6 - Grade 8 (765) 747-0869
    1018751369                                            Delta High School (1369)   school          Grade 9 - Grade 12 (765) 288-5597
    1018751409                                      Eaton Elementary School (1409)   school      Kindergarten - Grade 5 (765) 396-3301
    1018751520                                     Albany Elementary School (1520)   school      Kindergarten - Grade 5 (765) 789-6102
    1019101387                                       Yorktown Middle School (1387)   school           Grade 6 - Grade 8 (765) 759-2660
    1019101389                                         Yorktown High School (1389)   school          Grade 9 - Grade 12 (765) 759-2550
    1019101393                                   Yorktown Elementary School (1393)   school           Grade 3 - Grade 5 (765) 759-2770
    1019101395                              Pleasant View Elementary School (1395)   school      Kindergarten - Grade 2 (765) 759-2800
    1018951375                                         Wapahani High School (1375)   school          Grade 9 - Grade 12 (765) 289-7323
    1018951377                                          Selma Middle School (1377)   school           Grade 6 - Grade 8 (765) 288-7242
    1018951381                                      Selma Elementary School (1381)   school      Kindergarten - Grade 5 (765) 282-2455
    1019701500                                       Muncie Virtual Academy (1500)   school     Kindergarten - Grade 12             NA
    1019701513                                      East Washington Academy (1513)   school             Pre-K - Grade 5 (765) 747-5434
    ...