Search code examples
phpcurlweb-scrapinghttp-live-streamingm3u8

How to select specific text from a string generated by a PHP script?


I've been trying to scrape a HLS file from Twitch using several PHP scripts. The first one runs a cURL command to get the HLS URL through a Python script that returns said URL and converts the generated string to plain text, and the second (which is the one that isn't working) is supposed the extract the M3U8 file and make it able to be played.

First script (extract.php)

<?php
header('Content-Type: text/plain; charset=utf-8');
$url = "https://pwn.sh/tools/streamapi.py?url=twitch.tv/cgtn_live_russian&quality=1080p60";

$curl = curl_init($url);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

//for debug only!
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

$resp = curl_exec($curl);
curl_close($curl);
var_dump($resp);
$undesirable = array("}");
$cleanurl = str_replace($undesirable,"");
echo substr($cleanurl, 39, 898);

?>

This script (let's call it extract.php) works, and it returns (in plain text) the same information the Python script would return, which is this:

string(904) "{"success": true, "urls": {"1080p60": "https://video-weaver.fra05.hls.ttvnw.net/v1/playlist/[token].m3u8"}}"

Second script (play.php)

<?php
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Referer:https://myserver.com/" .
  "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0"
));

$html = file_get_contents("extract.php");

preg_match_all(
    '/(http.*?\.m3u8[^&">]+)/',

    $html,
    $posts, // will contain the article data
    PREG_SET_ORDER // formats data into an array of posts
);

foreach ($posts as $post) {
    $link = $post[0];

header("Location: $link");
}
?>

This second script (let's call it play.php) should theoretically return the M3U8 file (without string(904) "{"success": true, "urls": {"1080p60":) and make it able to be played in a media player, such as VLC, but it doesn't return anything.

Can someone tell me what's wrong? Did I make a syntax or regex error when making these PHP files or is the second file not working because of the other elements of the string?

Thanks in advance.


Solution

  • I think you can rely on the regex to get the URL out instead of trying to clean the string manually. The other way would be to use json_decode().

    Anyways the idea is to define a variable in extract.php, in this case it is $resp. Doing it via echo as you are now will not make it available in the parent script.

    You can then reference that variable in play.php once extract.php has been included.

    <?php
    //extract.php
    $resp = '';
    $url = "https://pwn.sh/tools/streamapi.py?url=twitch.tv/cgtn_live_russian&quality=1080p60";
    
    $curl = curl_init($url);
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    
    //for debug only!
    curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
    
    $resp = curl_exec($curl);
    curl_close($curl);
    
    
    //play.php
    include('./extract.php');
    
    //$resp is set in extraact.php
    preg_match_all(
        '/(http.*?\.m3u8)/',
        $resp,
        $posts, // will contain the article data
        PREG_SET_ORDER // formats data into an array of posts
    );
    
    foreach ($posts as $post) {
        $link = $post[0];
    }
    
    header("Location: $link");
    die();