Search code examples
vscode-extensionslanguage-server-protocol

Get all the strings using a VS Code language server


Apropos of How do I parse TS to symbols using a Language Server Protocol?, I already did this and I did indeed use the TS compiler as suggested in one of the answers to that question. That, however, is only good for TS and JS. There are many languages, and VS Code has language servers for most of them.

In my case I want the strings, all the strings, and nothing but the strings because I'm building a toolchain to support localisation. The problem is doing this in a language agnostic way. It occurred to me that VS Code identifies strings for the purposes of syntax colouring and validation, and this is how I found the abovementioned question. But that looks at it from an ad hoc perspective of "what is the thing at this position in some source code" rather than processing the entire file and emitting a traversable AST, which is what the TS compiler does.

Does anyone have experience using a VS Code language server like this (find all the string/method names/whatever) ? If so would you mind sharing a sketch of the best approach you found, and key points to read up on with respect to LSP and language server in general?

I really can't see any other practical way to do this that avoids wrapping a compiler for every target language. (But if you can I'm keen to read about it.)


Solution

  • Since revision 3.16, the language server protocol supports semantic tokens

    I.e. the language server will tokenize your file and you (i.e. the client) can ask for all tokens with a certain semantic. In your case the semantic string.

    The method that the client should call at the server is

    textDocument/semanticTokens/full (or any other variant than full)

    which will get you back a list of tokens.

    A full example can be found here