Search code examples
phpsearchvideotranscriptionclosed-captions

Storing Video and Indexing Content


Revisiting this in case someone has a suggestion:

I've been asked to either create or find a system that meet the following requirements...

1) Allow upload of video files - Current solution: PHP to upload to a directory above the web root, and then I'll verify users before allowing access to stream.

2) Tag the files with meta info about the participant in the video (these are surveys) for later searching. - Current solution: keyword text area separating items by ";" then parsing items into a "keywords" table in DB for searching later.

3) Transcribe the text for a full text search later, so if the participant states, "I like to swim, bike, run" later a search for "run" would find this result (triathlon would have probably been entered in the meta fields) - Current solution: use service to do transcriptions. Then upload text in a full text indexed field.

The info will be made available for clients with subscriptions, so in the example above, companies that deal with swimming, biking, running may be able to get this result, but ice cream vendors may not. - Current solution: assign categories to subscribers as well as videos during the sign up and check in phases. Make sure they match.

Seems like there will be a lot of manual setup, so if anyone has any better ideas for automating or controlling, please let me know.

Thanks for suggestions.


Solution

  • Easy! Build a separate (simple) interface for tagging that presents the user with a video and an input field to update tagging. Go through Amazon's Mechanical Turk to get people to do tagging of words. Cheap, easy, quick. Best I know, there's currently no server-side solution for doing what's essentially database-captured closed captioning. Even the television stations have people listening and typing away.

    By the way, your comma separated solution might be better served by individual records in a table linked by ids. Don't fear table joins, they can be faster and easier than comma delimited searches.