Search code examples
pandasyoutube-dl

youtube_dl video descriptions


I have a df containing a set of videoIDs from YT:

import pandas as pd

data = {'Order':  ['1', '2', '3'],
        'VideoID': ['jxwHmAoKte4', 'LsXM502SpiU','1I3f27iQ4pM']
        }

df = pd.DataFrame (data, columns = ['Order','VideoID'])

print (df) 

and want to download the video descriptions and save them in the same df in an extra column.

I tried to use youtube_dl in Jupyter this way:

import youtube_dl

def all_descriptions(URL):
    videoID=df['VideoId']
    URL = 'https://www.youtube.com/watch?v=' + videoID
    ydl_opts = {
    'forcedescription':True,
    'skip_download': True,
    'youtube-skip-dash-manifest': True,
    'no_warnings': True,
    'ignoreerrors': True
    }
   
    try:
        youtube_dl.YoutubeDL(ydl_opts).download(URL)
        return webpage

except:
    pass

df['descriptions']=all_descriptions(URL)

I see the output of the code as text, but in df only "None" as text of the column.

Obviously I can't transport the output of the function to df in the proper way.

Can you suggest how to get it right?

Thank you in advance for help.


Solution

  • @perl I modify the df to include two URLs that are causing two types of error:

    import pandas as pd
    
    data = {'Order':  ['1', '2', '3', '4', '5'],
            'VideoId': ['jxwHmAoKte4', 'LsXM502SpiU','1I3f27iQ4pM', 'MGQOX2rK5s', 'wNayw_E7lIA']
            }
    
    df = pd.DataFrame (data, columns = ['Order','VideoId'])
    
    print (df)
    

    Then I test it in the way you suggested, including my definition of ydl_opts:

    videoID=df['VideoId']
    URL = 'https://www.youtube.com/watch?v=' + videoID
    ydl_opts = {
        'forcedescription':True,
        'skip_download': True,
        'youtube-skip-dash-manifest': True,
        'no_warnings': True,
        'ignoreerrors': True
        }
        
    df['description'] = [
        youtube_dl.YoutubeDL(ydl_opts).extract_info(
            u, download=False)['description'] for u in URL]
    
    df
    

    Reaching to the first error I get the output:

    TypeError: 'NoneType' object is not subscriptable
    

    After that I replace 'forcedescription' in my code with 'extract_info':

    def all_descriptions(URL):
        videoID=df['VideoId']
        URL = 'https://www.youtube.com/watch?v=' + videoID
        ydl_opts = {
        'forcedescription':True,
        'skip_download': True,
        'youtube-skip-dash-manifest': True,
        'no_warnings': True,
        'ignoreerrors': True
        }
       
        try:
            youtube_dl.YoutubeDL(ydl_opts).download(URL)
            return webpage
        
        except:
            pass
    

    It skips all errors, but as the result there is nothing in the 'description'-column.

    Any sugggestions?