Search code examples
vbawebextractinnerhtml

extract data from website using VBA get nothing


I use below code to read and extract data from websites. But in specific URL (http://www.iamf.ir) there is a problem!

    Dim HTML_Content As HTMLDocument
    Dim dados As Object

    'Create HTMLFile Object
    Set HTML_Content = New HTMLDocument

    'Get the WebPage Content to HTMLFile Object
    With CreateObject("msxml2.xmlhttp")
        .Open "GET", "http://www.iamf.ir", False
        .send
        HTML_Content.body.innerHTML = .responseText
        Debug.Print .responseText                ' it's OK
        Debug.Print HTML_Content.body.innerHTML  ' it show nothing! (problem is here)
    End With

Solution

  • This should be the answer to your question, though I don't think it really solves your problem.

    The XMLHTTP request you do to this website respond with an empty body, as you can notice from the line Debug.Print .responseText:

    <HTML>
        <HEAD>
            <TITLE>&#1575;&#1605;&#1740;&#1606; &#1570;&#1588;&#1606;&#1575; &#1575;&#1740;&#1585;&#1575;&#1606;&#1740;&#1575;&#1606;</TITLE>
            <META NAME="Keywords" CONTENT="">
            <META HTTP-EQUIV="Refresh" CONTENT="0;URL=http://www.iafi.ir">
            <META NAME="Description" CONTENT="">
        </HEAD> 
        <BODY> <-- body is empty
        </BODY>
    </HTML>
    

    This is why, when you print the .body.innerHTML of your HTML_document, you get an empty string.

    Some websites are built in a way that only the full stack execution (i.e. also JavaScript execution, which doesn't happen when you perform an XMLHTTP request) is able to render correctly what you see in your browser. In your specific case, you might need to get the information performing a slower but always working scraping based on an invisible browser. You can check out this answer I wrote some time ago to have an idea.