How do I speed up repeated calls a ruby program (github's linguist) from python?

I'm using github's linguist to identify unknown source code files. Running this from the command line after a gem install github-linguist is insanely slow. I'm using python's subprocess module to make a command-line call on a stock Ubuntu 14 installation.

Running against an empty file: linguist __init__.py takes about 2 seconds (similar results for other files). ~~I assume this is completely from the startup time of Ruby~~. As @MartinKonecny points out, it seems that it is the linguist program itself.

Is there some way to speed this process up -- or a way to bundle the calls together?

Solution

One possibility is to just adapt the linguist program (https://github.com/github/linguist/blob/master/bin/linguist) to take multiple paths on the command-line. It requires mucking with a bit of Ruby, sure, but it would make it possible to pass multiple files without the startup overhead of Linguist each time.

A script this simple could suffice:

require 'linguist/file_blob'
ARGV.each do |path|
  blob = Linguist::FileBlob.new(path, Dir.pwd)
  # print out blob.name, blob.language, blob.sloc, etc.
end