Search code examples
jquerydomweb-scrapingjquery-traversing

using jquery to traverse dom for scraping data


I am working with the node phantom simple https://github.com/baudehlo/node-phantom-simple. It has made scraping the dom very simple. I am allowed to use jquery and I am getting into the data-table library.

Here is the code that I started with

 var nameArray = [];

        $("tbody[role='alert'] tr").each(function(data){
              var json = {};
              json.name= $(this).children(':first-child').text();
              json.size= $(this).children(':nth-child(2)').text();
              json.caffeine= $(this).children(':nth-child(3)').text();
              json.mgFloz=$(this).children(':last-child').text();
            nameArray.push(json);
        });

        // return tableData;
            return nameArray;

I am returning all of the data from the website that I have scraped. Inside of each table row is the format

<td><a href="">name of drink</a></td>
<td>info</td>
<td>info</td>
<td> info</td>

I am seeking to access the drink href. So i tried to target the html

json.url=$(this).children(':first-child').html();

I my response is

{ url: '<a href="/caffeine-content/zombie-blood-energy-potion">Zombie Blood Energy Potion</a>' }

This is close. All that I want is the href and I will be done. I tried targeting with attr() but I kept getting null back.

Is there a step I am missing or a work around?


Solution

  • You are close, but you need to traverse the DOM one more layer down. Use find():

    json.url = $(this).children(':first-child').find('a').attr('href');
    

    For the name property, you can use a similar approach:

    json.name = $(this).children(':first-child').find('a').text();