At my shcool we have a interactive white boards and we can export them to a website with a provided link. Only problem is that the links expire (which is stupid), so I want to make a simple python script that gets the images and downloads them.
Here is the link to the website: https://air.ifpshare.com/documentPreview.html?s_id=8ec97e16-51c4-4a77-9f64-7d5dccd9bb41#/detail/561f0184-384c-4ca1-91a4-b2e687865408/record
When I open chrome and inspect the website, I see that the images are contained in a main divider with sub divider and image elements which encode the image in base 64. This is thus easy to decode them in python.
This is the simple script i wrote to get the html:
import requests
page = requests.get("https://air.ifpshare.com/documentPreview.html?s_id=8ec97e16-51c4-4a77-9f64-7d5dccd9bb41#/detail/561f0184-384c-4ca1-91a4-b2e687865408/record")
print(page.text)
Only problem is, when I try to get the html, I don't get any of the content... The content seems to be coming from the javascript that is in the website.
The same thing happens when I use Selenium
Here is what I get:
<!DOCTYPE html><html><head><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=no"><link rel=stylesheet href=//at.alicdn.com/t/font_833191_27456hr9ow5.css><title id=PageTitle></title><style>html,
body {
max-width: 480px;
height: 100%;
margin: auto;
background-size: 100% 100%;
background: #F8F8F8;
}</style><link href=/static/css/documentPreview.01a0856b7f615fdfd7f4b853e047bcd0.css rel=stylesheet></head><body><div id=app></div><script type=text/javascript src=/static/js/manifest.a3f705024b2774dd271e.js></script><script type=text/javascript src=/static/js/vendor.03a8b2ef6819d9eaa4e7.js></script><script type=text/javascript src=/static/js/documentPreview.a9f6fe7b5c4d6f073050.js></script></body></html>
Does anyone know a workaround?
Note: this answer contains different methods to reach your goal.
I saw your target web app fetching image download URLs from an API endpoint and it is easy to fetch those images using the requests library with a little bit of code (no need to use bs4 if you want).
here is the API endpoint https://air.ifpshare.com/api/pub/files/UUID
https://air.ifpshare.com/documentPreview.html?s_id=8ec97e16-51c4-4a77-9f64-7d5dccd9bb41#/detail/561f0184-384c-4ca1-91a4-b2e687865408/record
/detail/
path you will see a UUID
value, well, this is your file UUID
,now merge this file UUID with the API endpoint you will get the downloadUrl
value from the JSON response and this is your complete download URL, here is the code:
import requests
def fetchResp(UUID):
url = f"https://air.ifpshare.com/api/pub/files/{UUID}"
response = requests.get(url)
items = response.json()['items']
for n, urls in enumerate(items):
image = urls['downloadUrl']
image_url = f"https:{image}" #missing HTTP in the response value so added this manually
image_data = requests.get(image_url).content
with open(f"image-{n}.png", 'wb') as image_d:
image_d.write(image_data)
fetchResp('561f0184-384c-4ca1-91a4-b2e687865408') #file UUID is here