I want to scrape domain name and social links (linkedin, twitter) emails from the the following website. https://cloud28plus.com/en/partner/resecurity--inc- I tried to fetch data from Network Request first. it did not work. then I tried requests module. It is throwing an exception when I try this:
response = requests.get(url)
data = response.json() # not working.
Then I tried BeautifulSoup. when I print soup.body, it returns data. but it is not structured, hence soup object returns empty list [], when I call soup.find_all('a'). My code is
import requests
from bs4 import BeautifulSoup
url = 'https://cloud28plus.com/en/partner/resecurity--inc-'
response = requests.get(url)
# data = response.json() # not working
page = response.text
soup = BeautifulSoup(page, 'html.parser')
# Returns Empty list
soup.find_all('a')
soup.find('a', class_ = 'followUs__IconTwitter-sc-1gwf1fm-2 edzSJr fa fa-twitter-square') # returns nothing
soup.find_all('div', class_ = 'col'). # empty list
can anybody tell what am I doing wrong?
The data you see on the page is stored inside embedded Json. To parse it, you can use next example:
import json
import requests
from bs4 import BeautifulSoup
url = "https://cloud28plus.com/en/partner/resecurity--inc-"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = json.loads(soup.select_one("#__NEXT_DATA__").contents[0])
# uncomment this to see all data:
# print(json.dumps(data, indent=4))
print(data["props"]["initialProps"]["pageProps"]["element"]["twitter"])
Prints:
https://twitter.com/RESecurity