Search code examples
youtube-apigoogle-search-appliance

Index Youtube videos for Google Search Appliance


We're successfully using the Youtube API to create a metadata-and-url xml feed that the GSA requires and pushing it to our Google Search Appliance according to the documentation

The question that we have is that we know you need to put a start url in the Content Sources > Web Crawl > Start and Block URLs page in the Admin Console. If we put in https://www.youtube.com as a start url and a follow pattern of https://www.youtube.com/watch?v=* (which all looks like all youtube videos follow) will the GSA only index whats coming from the feed or will it go out to youtube.com and index a bunch of content that isn't part of our channel? I don't see anywhere you can specify a channel for a video.

FYI, we are aware of FishBowlSolutions connector for YouTube but trying to avoid spinning up another server with TomCat just to index our YouTube videos.


Solution

  • You should not add the youtube-url to your Start URLs, only to your Follow Patterns. That way, the crawler will not crawl Youtube from top to bottom, but the URLs you provide in the feed will be crawled. However, if GSA finds URLs on the crawled pages, it will obviously also crawl those. An option is to tighten the Follow Patterns. And of course you can develop a Youtube connector on Googles Adaptor Framework, which is not that hard for Java-developers!