Search code examples
javaphphtmlcommentsnested

Why do many languages not support nested block comments?


In most the languages I use, you simply cannot nest block comments, because the first occurrence of the "close" comment syntax closes the comment even if it was only an "inner" comment.

For example, in HTML

<!-- outer comment
<p>hello</p><!-- inner comment <p>world</p> -->
<p>this should BE commented</p>
-->

in this case, the outer comment ends on the first --> instead of the corresponding last one, causing the last <p> to print, when it shouldn't.

The same happens for languages that use /* */ for block comments, like in java, php, css, javascript, etc.

But my question is WHY is it that way? Why, by design, it is not allowed? I mention "by design" because I really doubt it is because of parsing problems, I guess the parsers are perfectly capable of keeping track of opening /*s and close the comments with their corresponding closing */s But they simply somehow decided it is not a good idea.

I already know that a workaround for this is to somehow change the inner closing comments, to avoid them to close , and only leave the last closing one. e.g. changing inner -->s and */s for - ->s and * /s . But that is obviously not convenient, and hard to do when you only want to discard blocks of code for debugging purposes. (other techniques are to nest everything in if(false){} blocks, but that is not the point here.

So, what I'd like to know is WHY nested comments are generally not allowed in several modern languages? there must be a good reason other than "others don't do it, we won't either" right?.

And as a plus, are there any other (not so obscure) languages that DO allow nested block comments?


Solution

  • The reason is historical and has to do with the architecture of compilers.

    For the sake of efficiency, most compilers traditionally parse the source code in two stages: the lexical analysis and the actual parsing of a token stream (that was produced by said lexical analysis). The lexical analysis is the part that recognises individual tokens, such as keywords, strings, number literals – and comments.

    Again for reasons of efficiency, lexical analysis is traditionally implemented via a finite-state machine. These finite-state machines happen to recognise (= handle) regular languages, which fits perfectly for the above-mentioned tokens. However, it is not able to recognise nested constructs – this would require a more powerful machine (augmented by a stack).

    Not allowing nested comments was thus simply a decision that traded off convenience for performance, and subsequent languages have by and large adopted the convention.

    And as a plus, are there any other (not so obscure) languages that DO allow nested block comments?

    There are some. The comments already mentioned Haskell and Pascal. Other languages are D and F#.