Search code examples
fileprogramming-languagesauto

Auto-detect language of file


Is there a way to auto-detect the language that a file is written in or a way to say "this file is 20% C, 30% python, 50% shell." There must be some way because Github's remote server seems to autodetect languages. Also, if the file is a hybrid of languages, what is the de-facto way to set the file extension so that it represents those languages that are in the file. Maybe files have to all be homogeneous in regards to language. I am still learning. Additionally, is there a way to autodetect bytes of a codebase on a remote site like Github. So basically like Github's bar for languages except the bar shows how many bytes the project is taking up.


Solution

  • The file command on Linux does a reasonable job of guessing the language of a file, but basically it's just looking at the first characters of a file and comparing them to known situations: "if the file starts with blah-blah-blah it is probably thus-and-so".

    As far as the file containing "20% C, 30% Python, etc" -- what would you do with such a file if you had one? Neither the C compiler nor the Python compiler would be happy with it.