I am using nodejs and jsdom, attempting to retrieve images used within anchor tags.
I can enumerate the images using .querySelectorAll("img");
and the anchors using .querySelectorAll("img");
.
But I can't seem to find the relationship between the two which is the part that I am after, to know that the image displayed when clicked navigates to x.
sample html
<a href="http://www.yahoo.com">
<img src="https://s.yimg.com/nq/nr/img/yahoo_mail_global_english_white_1x.png" alt="Yahoo Mail Image">
</a>
Node.js
var links = dom.window.document.querySelectorAll("a");
links.forEach(function(value){
console.log('Host: ' + value.hostname);
console.log('Href: ' + value.href);
console.log('Text: ' + value.text);
console.log('HTML: ');
console.dir(value);
});
Expected result:
link to x is displayed with image.alt "yahoo mail image" and image.src "https://...."
Without seeing your HTML context, I can suggest running queries within the link subtrees:
const {JSDOM} = require("jsdom"); // ^22.0.0
const html = `
<a href="http://www.yahoo.com">
<img src="https://s.yimg.com/nq/nr/img/yahoo_mail_global_english_white_1x.png" alt="Yahoo Mail Image">
</a>
<a href="http://www.google.com">
<img src="google.png" alt="Google Image">
</a>
<a href="http://www.example.com">
<img src="whatever.png" alt="Whatever Image">
</a>`;
const {window: {document}} = new JSDOM(html);
const data = [...document.querySelectorAll("a")].map(e => ({
src: e.querySelector("img").src,
alt: e.querySelector("img").getAttribute("alt"),
href: e.href,
}));
console.log(data);
Output:
[
{
src: 'https://s.yimg.com/nq/nr/img/yahoo_mail_global_english_white_1x.png',
alt: 'Yahoo Mail Image',
href: 'http://www.yahoo.com/'
},
{
src: 'google.png',
alt: 'Google Image',
href: 'http://www.google.com/'
},
{
src: 'whatever.png',
alt: 'Whatever Image',
href: 'http://www.example.com/'
}
]
However, it's likely that there are other links on the page you're working with, so I would add a parent container to refine your a
selector, which is probably too broad and will attempt to grab links that don't have <img>
tags as children.
Using the sizzle pseudoselector a:has(img)
, xpath, or a fiter (shown below) might also help:
const data = [...document.querySelectorAll("a")]
.filter(e => e.querySelector(":scope > img"))
.map(e => ({
src: e.querySelector("img").src,
alt: e.querySelector("img").getAttribute("alt"),
href: e.href,
}));
...but this is speculation.