I'm trying to scrape an image URL from an external website, to use as the src attribute in an image element on my own website.
The issue is, that on the external website, the image URL is nested inside a picture element, in a source element, where it is then iterated in the "data-srcset" attribute in different sizes. Example:
<picture><source data-srcset="https://imageurl.com 640w, https://imageurl.com 720w, https://imageurl.com 860w"></picture>
I can target the specific element with PHP Simple HTML DOM's find() and store it in a variable. This variable is called $imageselector. I can further target the actual attribute by creating a variable for the data-srcset as such:
$srcset = 'data-srcset';
My final output then looks like the following:
<?php echo $imageselector->$srcset; ?>
This, however, attempts to print everything inside the attribute (of course), which won't be very useful to me.
Does anyone have any ideas of how to only get, say the first URL in the attribute?
(Adding a max-length won't do much good either, since the length of the URL could change at any point)
You could take $imageselector->$srcset
, split its contents into an array and filter as needed.
$longString = $imageselector->$srcset;
$pics = explode(",", $longString)
Now you have an array containing things such as "https://imageurl.com 640w"
so now you can take [1]
for example.
$toUse = explode(" ", $pics[1]);
$toUse = $toUse[0]; //to get the useful part of the item
Alternatively, you can also pre-filter the entire array
function getLink($string) {
return substr($string, 0, strpos($string, " "));
}
//Once you already have the exploded string
$pics = array_filter($pics, "getLink");