Search code examples
google-sheetsweb-scrapingxpathyoutubegoogle-sheets-formula

How would I import YouTube Likes and Dislikes and a ratio from YouTube onto Google Sheets?


What would be the correct xpath to get YouTube likes and dislikes from a video?


Solution

  • TITLE:

    =IMPORTXML("https://www.youtube.com/watch?v=MkgR0SxmMKo","//*[@id='eow-title']")
    

    or:

    =REGEXEXTRACT(QUERY(ARRAY_CONSTRAIN(IMPORTDATA(A12), 500, 1),
     "where Col1 contains '/title'", 0), ">(.+)<")
    

    enter image description here

    VIEWS:

    =VALUE(REGEXREPLACE(TEXT(IMPORTXML("https://www.youtube.com/watch?v=MkgR0SxmMKo",
     "//*[contains(@class, 'watch-view-count')]"),0)," view(s)?",""))
    

    DURATION:

    =SUBSTITUTE(REGEXREPLACE(IMPORTXML("https://www.youtube.com/watch?v=MkgR0SxmMKo","//*[@itemprop='duration']/@content"),"PT|S",""),"M",":")
    

    LIKES:

    =IF(ISNA(IMPORTXML("https://www.youtube.com/watch?v=MkgR0SxmMKo","(//*[contains(@class,'like-button-renderer-like-button')])[1]"))=TRUE,0,
             IMPORTXML("https://www.youtube.com/watch?v=MkgR0SxmMKo","(//*[contains(@class,'like-button-renderer-like-button')])[1]"))
    

    DISLIKES:

    =IF(ISNA(IMPORTXML("https://www.youtube.com/watch?v=MkgR0SxmMKo","(//*[contains(@class,'like-button-renderer-dislike-button')])[1]"))=TRUE,0,
             IMPORTXML("https://www.youtube.com/watch?v=MkgR0SxmMKo","(//*[contains(@class,'like-button-renderer-dislike-button')])[1]"))
    

    UPLOADED:

    =REGEXREPLACE(IMPORTXML("https://www.youtube.com/watch?v=MkgR0SxmMKo",
     "//*[contains(@class, 'watch-time-text')]"),"((Uploaded)|(Published)|(Streamed live)) on ","")
    

    SUBSCRIPTIONS:

    =IFERROR(MID(QUERY(IMPORTXML("https://www.youtube.com/channel/"&A1,
     "//div[@class='primary-header-actions']"), "select Col1"), 31, 20), )
    

    6

    CHANNEL NAME:

    =INDEX(IMPORTHTML("https://www.youtube.com/channel/UC7_gcs09iThXybpVgjHZ_7g","list",1),1,1)
    

    CHANNEL ID:

    =ARRAYFORMULA(REGEXREPLACE(QUERY(SUBSTITUTE(ARRAY_CONSTRAIN(
     IMPORTDATA(https://www.youtube.com/watch?v=rckrnYw5sOA), 3000, 1), """", ""),
     "where Col1 contains '<meta itemprop=channelId content='"),
     "<meta itemprop=channelId content=|>", ""))
    



    UPDATE:

    channel name (07/07/2021):

    =REGEXEXTRACT(QUERY(FLATTEN(IMPORTDATA(A4)), 
     "where Col1 contains '\x22channelName\x22:\x22'", 0), ":\\x22(.+)\\x22$")
    

    video title (08/08/2021)

    =REGEXEXTRACT(QUERY(FLATTEN(IMPORTDATA(A1)), 
     "where Col1 starts with 'title:""'", 0), """(.*)""")
    

    enter image description here

    duration (21/04/2022)

    =TEXT(1*REGEXEXTRACT(QUERY(FLATTEN(IMPORTDATA(B1)), 
     "where Col1 contains 'approxDurationMs' limit 1", ), 
     "\d+")/3600000/24, "mm:ss")
    

    enter image description here

    channel views (07/06/2022)

    =REGEXEXTRACT(QUERY(FLATTEN(IMPORTXML(A1, "//*")), 
     "where Col1 contains '"&CHAR(10)&"Creators'", ), 
     ".x22text.x22:.x22(.+).x22,.x22bold.x22")
    

    enter image description here

    video views (01/09/2022)

    =REGEXEXTRACT(QUERY(FLATTEN(IMPORTDATA(A1)); 
     "where Col1 starts with 'viewCount'"; ); "\d+")*1
    

    enter image description here