I am trying to scrape IPO date from crunchbase. Unfortunately I get Runtime Error 1004 “Application-defined or Object-defined error”. My goal is to save IPO date in A1 cell.
Sub GetIE()
Dim IE As Object
Dim URL As String
Dim myValue As IHTMLElement
URL = "https://www.crunchbase.com/organization/verastem"
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate URL
Do While IE.Busy Or IE.ReadyState <> 4
DoEvents
Loop
Set myValue = IE.Document.getElementsByClassName("post_glass post_micro_glass")(0)
Range("A1").Value = myValue
Set IE = Nothing
End Sub
I can't find that class name in the html for that url. You can use the css selector I show below which can be scraped by xmlhttp and thus avoiding opening a browser
Option Explicit
Public Sub GetDate()
Dim html As HTMLDocument
Set html = New HTMLDocument '< VBE > Tools > References > Microsoft Scripting Runtime
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.crunchbase.com/organization/verastem#section-overview", False
.send
html.body.innerHTML = .responseText
End With
ActiveSheet.Range("A1") = html.querySelectorAll(".field-type-date.ng-star-inserted").item(1).innerText
End Sub
If you don't want to use compound classes then you can also use
ActiveSheet.Range("A1") = html.querySelectorAll("#section-ipo-stock-price .field-type-date").item(1).innerText
You can see the relevant html here:
Note the element has multiple (compound) classes
<span class="component--field-formatter field-type-date ng-star-inserted" title="Jan 27, 2012">Jan 27, 2012</span>
There are 3 classes component--field-formatter
; field-type-date
and ng-star-inserted
. I use two of these in combination in the first solution I give. Multiple classes is popular now-a-days due to the versatility it gives in page styling e.g. it allows overriding styles easily. You can read about css specificity* to understand this better.
More classes may mean the code is a little less robust as the ordering of classes may be changed and a class, or more, may be removed. This was raised by @SIM in a comment on an answer to another web-scraping question. Thus, I offer one solution with two of the classes used, and another solution with only one of the classes used.
Whilst you do get the same date for this page with simply:
ActiveSheet.Range("A1") = html.querySelector("#section-ipo-stock-price .field-type-date").innerText
I wouldn't want to assume that would always hold true as it grabs the date from the line where it says "Their stock opened"
.
* https://developer.mozilla.org/en-US/docs/Web/CSS/Specificity
References: