I'm working on a React app that includes writing code on the user side. So, syntax highlighting would be lovely. The problem is that the syntax in question is nonstandard - no package is going to have it built in.
I've been looking at this for a while, and I've had very little luck getting any standard package to let me import custom languages. So, if anyone can recommend a package that can do that easily I'd appreciate it. The other options is to build my own. This is reasonable since I only need a very small number of actual highlights/colors (and I'd have to write that part anyway).
In that case, I don't really understand how these different components work. I see that they tend to use pre tags containing span elements and the like, along with a textarea. Somehow the textarea is being used for the actual typing but is maybe being hidden? That's what it looks like, to create the illusion of formatted text in a text box. Maintaining a clean copy for interpreting, and a formatted copy for display.
But I have no idea what the actual structure that makes this work is. Why does the pre area hide the textbox? Is that even what's happening? Is it altering it somehow, or just matching it?
If anyone who understands these things can provide a high level concept of how a JS syntax highlighter works, and the HTML structure, that might be enough to tell me how to build one.
Thanks.
High level concept goes briefly like:
Most code editor don’t use textarea to display the actual code, since it’s impossible to apply fine-grained syntax highlights to a textarea.
The hidden textarea is "write-only", it serves only as an input event capturer. The display of code text, aka the View
, is to be reconstructed later.
The text string is stored behind the scene in some internal DocumentModel
as they might call it. Lots heavy work is done here, things like cursor-tracking or word-segmenting. But most importantly, tokenization.
So tokenization takes place. It’s the process of recognizing each element in your code, and make sense of them. Think of lots regex matching. After the process code text is broken up into fragments and each labeled their token type like variable
, parameter
, number
etc.
Then a View
of the code is rendered base on the tokenized result. Each fragment is rendered as something like <span class="variable">code here</span>
and CSS will color it accordingly. Of course the whole thing is most likely wrapped in <pre>
tag.