I am working with the node phantom simple https://github.com/baudehlo/node-phantom-simple. It has made scraping the dom very simple. I am allowed to use jquery and I am getting into the data-table library.
Here is the code that I started with
var nameArray = [];
$("tbody[role='alert'] tr").each(function(data){
var json = {};
json.name= $(this).children(':first-child').text();
json.size= $(this).children(':nth-child(2)').text();
json.caffeine= $(this).children(':nth-child(3)').text();
json.mgFloz=$(this).children(':last-child').text();
nameArray.push(json);
});
// return tableData;
return nameArray;
I am returning all of the data from the website that I have scraped. Inside of each table row is the format
<td><a href="">name of drink</a></td>
<td>info</td>
<td>info</td>
<td> info</td>
I am seeking to access the drink href. So i tried to target the html
json.url=$(this).children(':first-child').html();
I my response is
{ url: '<a href="/caffeine-content/zombie-blood-energy-potion">Zombie Blood Energy Potion</a>' }
This is close. All that I want is the href and I will be done. I tried targeting with attr() but I kept getting null back.
Is there a step I am missing or a work around?
You are close, but you need to traverse the DOM one more layer down. Use find()
:
json.url = $(this).children(':first-child').find('a').attr('href');
For the name
property, you can use a similar approach:
json.name = $(this).children(':first-child').find('a').text();