Search code examples
pythonseleniumbeautifulsoupblobhttp-live-streaming

Is there a way to get HTTP Live Streaming (HLS) content with Firefox/Chrome?


I'm scraping video source with Selenium and BeautifulSoup. I want to ask if there is a way to extract the m3u8 file(HLS content) rather than blob file with either Firefox or Chrome?

Following code scrape the video source as a playlist string using Selenium Safari web driver.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver
from bs4 import BeautifulSoup
import re
import urllib.request


def get_all_channels(base: str="https://www.telewebion.com/channels"):
    channels_url = urllib.request.urlopen(f"{base}")
    soup_channels_url = BeautifulSoup(channels_url, "lxml")

    # create a list of all channels
    all_channels_list = []
    for a in soup_channels_url.select('.no-featured a'):
        all_channels_list.append(a['href'])
        # all_channels_list.append(a['href'], a.get_text(strip=True))

    # return the list
    return all_channels_list


def get_video_src(url: str, base: str="https://www.telewebion.com"):
    channel_url = f"{base}{url}"

    wd = webdriver.Safari()
    # wd = webdriver.Chrome()
    # wd = webdriver.Firefox(executable_path='/usr/local/bin/geckodriver')

    wd.get(channel_url)
    WebDriverWait(wd, 6000).until(EC.visibility_of_element_located(
        (By.CLASS_NAME, "position-relative")))

    html_page = wd.page_source

    # Now use html_page
    soup = BeautifulSoup(html_page, "lxml")

    video = soup.find_all("video", class_="rmp-object-fit-contain")
    video_src = video[0]['src']

    wd.quit()

    return video_src

for channel in get_all_channels():
    print(get_video_src(channel))

Results are m3u8 playlist(HLS content) strings which I'm interested in, but it is not a scalable solution since only works when Safari installed. Firefox/Chrome web drivers of Selenium return the blob strings instead. My ultimate goal is to download extended M3U (m3u8) playlist(or any other type of the video stream) rather than chunks of the video stream in order to use as Kodi add-ons video source.

P.S. Video sources are dynamic and rendered by JavaScript to load their content; therefore I used Selenium to call the browser.


Solution

  • I don't think you need to use selenium to get a list of channel or channel links.

    STEPS: You can use any programming language of your preference.

     1. Get All channels:
    Make a get request to this url to get all the channels.
    https://wa1.telewebion.com/v2/channels/getChannels?logo_version=4&thumb_size=240
    
    If you look at the response. "data" is an array of channel that has attribute called "descriptor" which gives us value of "channel_desc" for next request
    
     2. Get channel links:
    Make a get request to using link below to get all links of channel from first request
    https://wa1.telewebion.com/v2/channels/getChannelLinks?channel_desc=tv1&device=desktop&logo_version=4
    
    The channel desc value "tv1" was received from first call.
    On the response if you look at the links on data you will see all the m3u8 urls to for the tv1 channel. 
    
     3. Now you can use https://github.com/carlanton/m3u8-parser 
     to parse the m3u8 file to get the playlist urls or segment urls on the master or media manifests.
    

    You can read about m3u8 specification here: https://datatracker.ietf.org/doc/html/draft-pantos-http-live-streaming-08