Search code examples
pythonrubysubprocessgithub-linguist

How do I speed up repeated calls a ruby program (github's linguist) from python?


I'm using github's linguist to identify unknown source code files. Running this from the command line after a gem install github-linguist is insanely slow. I'm using python's subprocess module to make a command-line call on a stock Ubuntu 14 installation.

Running against an empty file: linguist __init__.py takes about 2 seconds (similar results for other files). I assume this is completely from the startup time of Ruby. As @MartinKonecny points out, it seems that it is the linguist program itself.

Is there some way to speed this process up -- or a way to bundle the calls together?


Solution

  • One possibility is to just adapt the linguist program (https://github.com/github/linguist/blob/master/bin/linguist) to take multiple paths on the command-line. It requires mucking with a bit of Ruby, sure, but it would make it possible to pass multiple files without the startup overhead of Linguist each time.

    A script this simple could suffice:

    require 'linguist/file_blob'
    ARGV.each do |path|
      blob = Linguist::FileBlob.new(path, Dir.pwd)
      # print out blob.name, blob.language, blob.sloc, etc.
    end