I'm using python's gdata module to pull out the comments of a youtube video, but I'm running into some issues, it works on relatively less popular videos (videos without many comments), but any video that has significantly more comments, will return me a bad request error.
gdata.service.RequestError: {'status': 400, 'body': 'Invalid value for parameter: video-id', 'reason': 'Bad Request'}
This is my function:
def getComments(client, video_id):
comment_feed = client.GetYouTubeVideoCommentFeed(video_id=video_id)
while comment_feed is not None:
for comment in comment_feed.entry:
yield comment
next_link = comment_feed.GetNextLink()
if next_link is None:
comment_feed = None
else:
comment_feed = client.GetYouTubeVideoCommentFeed(next_link.href)
Anybody knows how to get around this?
EDIT:
So I decided to try another approach and this is what I got:
from gdata.youtube import service
comment_feed_url = "http://gdata.youtube.com/feeds/api/videos/%s/comments?max-results=50"
USERNAME = ''
PASSWORD = ''
''' Get the comment feed of a video given a video_id'''
def WriteCommentFeed(video_id):
client = service.YouTubeService()
client.ClientLogin(USERNAME, PASSWORD)
url = comment_feed_url % video_id
comment_feed = client.GetYouTubeVideoCommentFeed(uri=url)
allComments = []
while comment_feed:
for comment_entry in comment_feed.entry:
allComments.append(comment_entry.content.text)
print len(allComments)
print comment_feed.GetNextLink().href
comment_feed = client.Query(comment_feed.GetNextLink().href)
if __name__ == "__main__":
WriteCommentFeed("5DdzE4k31fM")
And it seems to be breaking at 150 comments, so at the query for the 200th comment, I get this error:
'reason': server_response.reason, 'body': result_body}
gdata.service.RequestError: {'status': 414, 'body': '<!DOCTYPE html>\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">\n <title>Error 414 (Request-URI Too Large)!!1</title>\n <style>\n *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/errors/logo_sm_2.png) no-repeat}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/errors/logo_sm_2_hr.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/errors/logo_sm_2_hr.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/errors/logo_sm_2_hr.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:55px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>414.</b> <ins>That\xe2\x80\x99s an error.</ins>\n <p>The requested URL <code>/feeds/api/videos/5DdzE4k31fM/comments</code>... is too large to process. <ins>That\xe2\x80\x99s all we know.</ins>\n', 'reason': 'Request-URI Too Large'}
Anyone has any idea why?
When trying to paginate on the comments of a video having a large number of comments, the API generates pagination links with a length > 2048
chars. So you end up on a HTTP 414 Error Page.
To make it work:
Let's say you are making a following request:
http://gdata.youtube.com/feeds/api/videos/VIDEO_ID/comments
then one of your query string parameters shall be:
orderby=published
e.g. :
"https://gdata.youtube.com/feeds/api/videos/" + _videoId + "/comments?orderby=published&max-results=50"
This way, the API retrieves the results based on time rather than relevance, and that is presumably why the start-token which it generates is a lot shorter.
Hope this helps.