Search code examples
ftptext-extractionimdb

Sorting IMDB FTP data title type


I was trying to build a graph connecting actors using the movies they have worked for using IMDB ftp data.

However I only wanted to use movies (title type :- Feature Films) as connections. I downloaded the ftp data from IMDB. However I was not able extract the title type from the files. I wanted to know if any body has tried to sort the ftp data provided by IMDB based on title type and how did they differentiate.


Solution

  • The title itself will tell you what kind of show you're dealing with.

    • If it ends with "(TV)" it's a TV movie (a single episode, produced for TV).
    • If it ends with "(V)" it's a video movie (straight to video).
    • If it's surrounded by quotes and ends with "(mini)" it's a tv mini series. (NOTE: I think that this category is no longer present, in the plain text data file).
    • If it's only surrounded by quotes, it's a tv series.
    • If the title is surrounded by quotes and ends with another titles enclosed in curly brackets, it's an episode of a tv (mini or not) series (inside the brackets there's the title of the episode - if known - or the #seasonNR.episodeNR or the air date).
    • anything else, is a movie.

    A special case are tv series episodes marked with {{SUSPENDED}}, which means that the episode was never produced, but it was planned and maybe it will be done in the future.

    Notice that these rules apply only to the plain text data files that you can download from the FTP servers. Since some years, on the web site a different of rules are followed.

    I've done a lot of research on the subject, being the main author of IMDbPY (by the way: give it a look, since it may be useful to you to import these information into a SQL db).