Search code examples
programming-languagesvisual-studio-codefile-extensionprismjs

Database of file extensions to file type/language mappings


vscode uses a nice schema for file extension to language mapping:

https://code.visualstudio.com/docs/languages/identifiers

   "files.associations": {
        "*.myphp": "php"
    }

    "languages": [{
        "id": "java",
        "extensions": [ ".java", ".jav" ],
        "aliases": [ "Java", "java" ]
    }]

Tools like prismjs, have built-in support for hundreds of languages.

But I can't find a database of common extension to language mapping anywhere. Note I don't care about mime types. In my case I want have a built-in (actually, it'd ideally be driven off a webapi) set of mappings.

I've searched the vscode and found the code that deals with the mappings it handles (/src/vs/editor/common/services/languagesRegistry.ts) but it appears to load the mapping from the extensions that are installed.

IOW, I want to generate (or find!) a JSON document using the above schema with all 199 languages that prismjs supports.

Any suggestions?


Solution

  • lang-map, a nodejs module makes it easy to go from either file extension to language or language to supported extensions.

    https://github.com/jonschlinkert/lang-map

    I've successfully used this to generate a JSON doc using the vscode schema.

    Probably aint pretty (not a javascript pro), but it works:

    // Imports file-extension to langauge mapping from both
    // prismjs and lang-map and outputs a JSON document that
    // follows the vscode schema for extension mapping.
    // PrismJS language definitions trump for my solution.
    const fs = require('fs');
    var map = require('lang-map');
    var components = require('prismjs/components.js');
    
    // vscode files.associations is not an array. Use a dictionary instead.
    var assocDict = {};
    var languages = [];
    
    for (var key in components.languages) {
        if (components.languages.hasOwnProperty(key) && key != 'meta') {
            var language = components.languages[key];
            var langTemp = {
              id : key
            };
    
            // vscode doesn't support title, but I want to use it
            if (typeof language.title != 'undefined')
                langTemp.title = language.title;
    
            if (typeof language.alias != 'undefined'){
                if (Array.isArray(language.alias)){
                    langTemp.aliases = language.alias;
                }
                else{
                    langTemp.aliases = ['.' + language.alias];
                }
            }
            var extensions = [];
            map.extensions(key).forEach(ext =>{
                // Add it to the extnsions for this langauge defn
                extensions.push('.' + ext);
    
                // also add it to the files.associations dictionary
                var pattern = '*.' + ext;
                var assoc = { 
                    pattern : key
                };
                assocDict[pattern] = key;
            });
            langTemp.extensions = extensions;
            languages.push(langTemp);
        }
    }
    
    // create JSON doc conforming to vscode spec. associations is not an array
    // languages is
    var output = { 
        'files.associations' : assocDict,
        'languages' : languages
    };
    
    var file = "../winforms/WinPrint.Core/Properties/languages.json";
    fs.writeFile(file, JSON.stringify(output, null, '  '), function (err) {
        if (err) {
            return console.log(err);
        }
        console.log("Wrote " + Object.keys(assocDict).length + " file-type associations and " + languages.length + " language defs to " + file);
    });
    
    
    

    I also found this, which I may just use instead:

    https://github.com/blakeembrey/language-map