Search code examples
excelvbaweb-scrapingpowerquerydata-retrieval

Searching a website and returning found results


With Excel Power query its possible to pull data from a website provided its in a database/table format.

Many online databased are so large however that they implement a search function instead of showing the entire database which is fine but causes a barrier when trying to efficiently locate information for many keywords.

The database I wish to search is:

https://apps.who.int/food-additives-contaminants-jecfa-database/search.aspx

Is it possible to create a list of keywords/CAS numbers and search the database for each of these sequentially and return data found? This is similar to web scraping but with the added step of actually searching for the data beforehand.


Solution

  • It's totally possible to acheive what you want. First you analyze the page, specifically the input box and the submit button and find what's identify them. I use Chrome Development Tools for this. Just open the desired page and press F12.

    In this case the input box is:

    <input name="ctl00$ContentPlaceHolder1$txtSearch" type="text" id="ContentPlaceHolder1_txtSearch">
    

    and the submit button is:

    <input type="submit" name="ctl00$ContentPlaceHolder1$btnSearch" value="Search" id="ContentPlaceHolder1_btnSearch">
    

    You can then use the ids to address the box with javascript:

    var inputBox = document.getElementById('ContentPlaceHolder1_txtSearch');
    inputBox.value = 'your search string';
    

    And the equivalent for the submit button:

    var searchButton = document.getElementById('ContentPlaceHolder1_btnSearch');
    searchButton.click(); // Start the search
    

    When the results are delivered you then need to analyze that page to figure out what javascript code is needed to extract the part of that page that you're interrested in. Or you can dump the complete page with:

    document.documentElement.outerHTML;
    

    Excel VBA example code for running javascript on a webpage here:

    https://github.com/peakpeak-github/libEdge

    Modify the code to suit your needs.