Search code examples
javayoutube-apigoogle-apiclosed-captions

Closed Captions in YouTube API v3


I need to read closed caption text from 3rd party, publicly available YouTube videos in my java webapp i.e. I have NOT uploaded the content.

Whilst v2 of the YouTube Data API restricted access to the caption information to the person who uploaded the video it seems like a very odd restriction to give access to everything except this one piece of data. I expected to see this restriction removed in v3 of the API but now the only reference to closed caption is a boolean method to confirm if CC is attached to the video. Even the owner can't seem to download it now. (Are Google going to add it back at least?)

Boolean hasCaptions = video.getContentDetails().getCaption()

Using YouTube Data API v3 (using the Google Java API client) I have been able to find, authenticate and retrieve YouTube resources (videos, playlists, channels, etc.). I can do pretty much everything the API has made available I just can't read the actual caption text.

I've also tried the unpublished timed text link workaround but this is inconsistent, doesn't work for newer content and has many encoding errors in the content it does cover.

I'm wondering if anyone knows of a method for retrieving caption text from a YouTube video from java (not a .js plugin)?

[ Worst case, does anyone know of a library that allows me to programmatically interact with YouTube like a browser and allows me to click the transcript button on the screen and I can pull the transcript from there? Prowser doesn't allow click interaction and JxBrowser is $1,300+ ]

The code below works fine and gets me to all the video data so it's the last step I need help on. I've included it here in case it's helpful to anyone who needs to get this far.

// Build a YouTube resource
YouTube youtube = new YouTube.Builder(new NetHttpTransport(),
                            new JacksonFactory(), 
                            new HttpRequestInitializer())
                    .setApplicationName("caption-retrieval")
                    .build();

// Create the video list request, it should only return one
// result
YouTube.Videos.List listVideosRequest = youtube.videos().list("id, snippet, contentDetails");
listVideosRequest.setKey(API_KEY));
listVideosRequest.setId(VIDEO_ID);

// Request is executed and video list response is returned
VideoListResponse listVideosResponse = listVideosRequest.execute();

List<Video> videos = listVideosResponse.getItems();

// Since a unique video id is given, it will only return
// one video. Would check if video has been removed in 
// production code.
Video video = videos.get(0);

// Read the remaining meta information
title = video.getSnippet().getTitle().trim();
author = video.getSnippet().getChannelTitle();

captionText = ???????

Any help is gratefully received.

Thanks,

Greg.


Solution

  • We are hoping to have Captions support on Data API v3 soon. You won't need to scrape the website.

    Update: This has been implemented now. The docs can be found here.