I've been trying to scrape a HLS file from Twitch using several PHP scripts. The first one runs a cURL command to get the HLS URL through a Python script that returns said URL and converts the generated string to plain text, and the second (which is the one that isn't working) is supposed the extract the M3U8 file and make it able to be played.
First script (extract.php)
<?php
header('Content-Type: text/plain; charset=utf-8');
$url = "https://pwn.sh/tools/streamapi.py?url=twitch.tv/cgtn_live_russian&quality=1080p60";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
//for debug only!
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
$resp = curl_exec($curl);
curl_close($curl);
var_dump($resp);
$undesirable = array("}");
$cleanurl = str_replace($undesirable,"");
echo substr($cleanurl, 39, 898);
?>
This script (let's call it extract.php) works, and it returns (in plain text) the same information the Python script would return, which is this:
string(904) "{"success": true, "urls": {"1080p60": "https://video-weaver.fra05.hls.ttvnw.net/v1/playlist/[token].m3u8"}}"
Second script (play.php)
<?php
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Referer:https://myserver.com/" .
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0"
));
$html = file_get_contents("extract.php");
preg_match_all(
'/(http.*?\.m3u8[^&">]+)/',
$html,
$posts, // will contain the article data
PREG_SET_ORDER // formats data into an array of posts
);
foreach ($posts as $post) {
$link = $post[0];
header("Location: $link");
}
?>
This second script (let's call it play.php) should theoretically return the M3U8 file (without string(904) "{"success": true, "urls": {"1080p60":) and make it able to be played in a media player, such as VLC, but it doesn't return anything.
Can someone tell me what's wrong? Did I make a syntax or regex error when making these PHP files or is the second file not working because of the other elements of the string?
Thanks in advance.
I think you can rely on the regex to get the URL out instead of trying to clean the string manually. The other way would be to use json_decode()
.
Anyways the idea is to define a variable in extract.php
, in this case it is $resp
. Doing it via echo
as you are now will not make it available in the parent script.
You can then reference that variable in play.php
once extract.php
has been included.
<?php
//extract.php
$resp = '';
$url = "https://pwn.sh/tools/streamapi.py?url=twitch.tv/cgtn_live_russian&quality=1080p60";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
//for debug only!
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
$resp = curl_exec($curl);
curl_close($curl);
//play.php
include('./extract.php');
//$resp is set in extraact.php
preg_match_all(
'/(http.*?\.m3u8)/',
$resp,
$posts, // will contain the article data
PREG_SET_ORDER // formats data into an array of posts
);
foreach ($posts as $post) {
$link = $post[0];
}
header("Location: $link");
die();