I am using python, request or aiohttp method to get page, and BeautifulSoup4 for parsing webpage. Server HTML page uses jinja template, so when i get this page using requests or aiohttp, i get something like this:
<a href="/{{username}}" class=\'pr\'>
but if you open this page using browser, code looks like this:
<a href="/gavrilka" class=\'pr\'>
request code:
import requests
url = 'MY URL'
header = {"MY HEADERS"}
payload = {}
response = requests.request("GET", url, headers=headers, data = payload)
print(response.text.encode('utf8'))
aiohttp code:
import aiohttp
url = 'MY URL'
header = {"MY HEADERS"}
payload = {}
async with aiohttp.ClientSession() as session:
async with session.get(base_url, headers=headers) as resp:
data = await resp.text()
print(data)
await session.close()
How should i do to get correct page text?
Used selenium and phantomjs, and now it works.
from selenium import webdriver
from bs4 import BeautifulSoup
url = "https://yourlink"
driver = webdriver.PhantomJS()
driver.set_window_size(1024, 768) # optional
driver.get(url)
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'lxml')