I have two codes .. that are supposed to export the html file to text file
Sub Demo1()
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
With http
.Open "GET", "https://www.google.com.eg/", False
.send
html.body.innerHTML = .responseText
WriteTxtFile html.body.innerHTML
End With
End Sub
Sub WriteTxtFile(ByVal aString As String, Optional ByVal filePath As String = "C:\Users\Future\Desktop\Output.txt")
Dim fso As Object
Dim fileout As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Set fileout = fso.CreateTextFile(filePath, True, True)
fileout.write aString
fileout.Close
End Sub
Sub Demo2()
Dim ie As Object
Dim f As Integer
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.navigate ("https://www.google.com.eg/")
Do: DoEvents: Loop Until .readyState = 4
f = FreeFile()
Open ThisWorkbook.Path & "\Sample.txt" For Output As #f
Print #f, .document.body.innerHTML
Close #f
.Quit
End With
End Sub
Both Demo1 and Demo2 are the codes .. and they resulted in "Sample.txt" and "Output.txt" But I found those html documents are different results Can you help me to clarify what is the right one .. and why they are different?
Thanks advanced for help
Xmlhttp does not provide all the rendered content of a webpage. Particularly anything rendered via JavaScript execution. Any scripts are not executed.
Internet Explorer on the other hand will render the page (provided the browser version and JavaScript syntax is supported. For example, you will run into problems with the ec6 - latest Ecmascript as this is not supported on legacy browsers. It is I believe on Edge for Windows 10. You can check compatibility tables to see what is and isn’t supported ) fully.
If you familiarize yourself with dev tools for your browser you can explore how different parts of a webpage are rendered. You can learn to debug scripts and see what changes are made to the DOM and page styling. Often a page will issue XHR requests to update content on a page for example. If you want to have a play look here.
So, I suspect that the first html document may have less content and a different overall DOM structure from the second on this basis.
To test for differences due to writing to text file methodology you need to compare Apples with Apples i.e use the same scraping access method and syntax to retrieve the page content before writing out.
Please provide the differences if you want a deeper explanation.
Exploring page updating: