I'm using for the first time PHP\Curl. My purpose is to create a bot that retrieve and gather data from several websites that are getting data from machines (i precise that i do own the data, the point for me is only to gather all of them in only one point). I've managed to log into those websites and get some of the data. I've managed also to get data within iframe, thanks to the file_get_contents function.
But, if i try to get the html within a simple frame (and not Iframe), it doesn't work. I used the url of the frame element below (yes with the full url). I don't get any errors. I do get some html elements but none of the html i'm looking for. I see the html body but it is almost empty. I'm totally sure about the url i'm giving to php/curl. What i should do to get the html within the frame ?
Here is the frame element visible on the page i'm trying to get the data from. (This is not what i get from the php/curl response) :
<frame name="WMain" src="/WSID0002340321/easy/GUI-1280">
The html i'm looking for is here
</frame>
So nothing fancy.
I've seen this post : How to use PHP CURL with frames? But the issue is not really the same and the answer is about iframe and suppose to have html elements.
Thank you for helping me.
I would suspect that maybe some of the HTML is either generated using Javascript - in which case it isn't there when the page is first loaded, so a simple request using a non-browser client like cURL will never see it, or maybe downloaded via an extra AJAX request - in which case you might be able to retrieve it by making a request directly to the URL used by AJAX. Or even some combination of the two. Inspecting the page more closely with your browser's Developer Tools might help you understand how the content is actually being created.
It's very common for pages these days to contain content which is loaded later and/or generated by scripts. So a basic download of the raw, original HTML cannot capture that extra content (because there is no Javascript environment which can then run code and download/create extra HTML). You'd need a headless browser, or a web client with the sophistication of the google crawler to be able to fully load such a page