Search code examples
jsonweb-scrapingbeautifulsoup

I am unable to scrape domain name from this website? Postman returns json() but requests through exception When I call response.json()


I want to scrape domain name and social links (linkedin, twitter) emails from the the following website. https://cloud28plus.com/en/partner/resecurity--inc- I tried to fetch data from Network Request first. it did not work. then I tried requests module. It is throwing an exception when I try this:

response = requests.get(url)
data = response.json() # not working.

Then I tried BeautifulSoup. when I print soup.body, it returns data. but it is not structured, hence soup object returns empty list [], when I call soup.find_all('a'). My code is

import requests
from bs4 import BeautifulSoup
url = 'https://cloud28plus.com/en/partner/resecurity--inc-'
response = requests.get(url)
# data = response.json() # not working
page = response.text
soup = BeautifulSoup(page, 'html.parser')
# Returns Empty list
soup.find_all('a')

soup.find('a', class_ = 'followUs__IconTwitter-sc-1gwf1fm-2 edzSJr fa fa-twitter-square')  # returns nothing
soup.find_all('div', class_ = 'col'). # empty list

can anybody tell what am I doing wrong?


Solution

  • The data you see on the page is stored inside embedded Json. To parse it, you can use next example:

    import json
    import requests
    from bs4 import BeautifulSoup
    
    url = "https://cloud28plus.com/en/partner/resecurity--inc-"
    
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    data = json.loads(soup.select_one("#__NEXT_DATA__").contents[0])
    
    # uncomment this to see all data:
    # print(json.dumps(data, indent=4))
    
    print(data["props"]["initialProps"]["pageProps"]["element"]["twitter"])
    

    Prints:

    https://twitter.com/RESecurity