Search code examples
excelweb-scrapingexport-to-excelvba

Excel Webscraping


I am having trouble web scraping with Excel. I want the data from this website: charitynavigator.org

I am doing a test run on this list of links: http://www.charitynavigator.org/index.cfm?bay=search.results&cgid=7&cuid=30

I recorded a macro of me going through the first link (The Aims Project). Then, I added a "loop" (?) that someone used in a YouTube video. The videomaker explained the logic of the code, so the loop seemed to make sense and to be applicable to my problem. However, after scraping the first page, the program just can't get to the next webpages.

Here is my code

Sub adds()

'For i = 1 to 5
 For x = 1 To 5
 Worksheets("IntlHum").Select
 Worksheets("IntlHum").Activate
 mystr = "URL;http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=3803"
 mystr = Cells(x, 1)
 Worksheets.Add(After:=Worksheets(Worksheets.Count)).Name = x

With ActiveSheet.QueryTables.Add(Connection:= _
    mystr, _
    Destination:=Range("$A$1"))
    '.CommandType = 0
    .Name = "index.cfm?bay=search.summary&orgid=3803_1"
    .FieldNames = True
    .RowNumbers = False
    .FillAdjacentFormulas = False
    .PreserveFormatting = True
    .RefreshOnFileOpen = False
    .BackgroundQuery = True
    .RefreshStyle = xlInsertDeleteCells
    .SavePassword = False
    .SaveData = True
    .AdjustColumnWidth = True
    .RefreshPeriod = 0
    .WebSelectionType = xlSpecifiedTables
    .WebFormatting = xlWebFormattingNone
    .WebTables = "2,3,4,5"
    .WebPreFormattedTextToColumns = True
    .WebConsecutiveDelimitersAsOne = True
    .WebSingleBlockTextImport = False
    .WebDisableDateRecognition = False
    .WebDisableRedirections = False
    .Refresh BackgroundQuery:=False
End With

Next x

End Sub

The YouTube video is https://www.youtube.com/watch?v=qbOdUaf4yfI I am a complete novice. I understand logic, have worked with STATA and LaTexx, but am no computer scientist. Please keep jargon minimal. THANKS!


Solution

  • In the second pass (x = 2) the notion of "ActiveSheet" may have changed so the line mystr = Cells(x, 1) will not return expected value as it will probably read from different data source then you intended. (See Office documentation of Application.Cells Property (Excel) for more details). The change of "ActiveSheet" is done as side-effect of calling the Office documentation of Worksheets.Add Method (Excel)

    Changing it to mystr = Worksheets("IntlHum").Cells(x, 1) may help.

    But my answer is learn how to use Excel VBA debugging, including "step into (F8)", "immediate window to print things".

    This is the must-learn tool for any serious Excel VBA and this is the tool to use for troubleshooting your code.

    Excel VBA (and Visual Basic in general) was not designed (or intended) for computer scientists but "for the masses" so you should find your way through quite easily (after learning some necessary "jargon").

    You should be able to find many tutorial resources that match your current skills using Google. As one such resource you may try http://www.excel-easy.com/vba/examples/debugging.html