I am scraping a website using PHP Goutte, but I need some informations that are indicated only in a script tag in the following way:
<script>
player.qualityselector({
sources: [
{ format: 'auto', src: "xxx.example.com", type: 'video/mp4'},
{ format: '1080p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '720p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '480p WEB-DL', src: "xxx.example.com4", type: 'video/mp4'},
{ format: '360p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '240p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
],
});
</script>
I need the src of each one, is that possible?
You can use regular expressions.
Example
$page_content = <<<EOF
<script>
player.qualityselector({
sources: [
{ format: 'auto', src: "xxx.example.com", type: 'video/mp4'},
{ format: '1080p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '720p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '480p WEB-DL', src: "xxx.example.com4", type: 'video/mp4'},
{ format: '360p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
{ format: '240p WEB-DL', src: "xxx.example.com", type: 'video/mp4'},
],
});
</script>
EOF;
preg_match_all('/src:\s"(.*)"/', $page_content, $match);
$result = $match[1];
print_r($result);
Output
Array
(
[0] => xxx.example.com
[1] => xxx.example.com
[2] => xxx.example.com
[3] => xxx.example.com4
[4] => xxx.example.com
[5] => xxx.example.com
)