Why do many languages not support nested block comments?

In most the languages I use, you simply cannot nest block comments, because the first occurrence of the "close" comment syntax closes the comment even if it was only an "inner" comment.

For example, in HTML

<!-- outer comment
<p>hello</p><!-- inner comment <p>world</p> -->
<p>this should BE commented</p>
-->

in this case, the outer comment ends on the first --> instead of the corresponding last one, causing the last <p> to print, when it shouldn't.

The same happens for languages that use /* */ for block comments, like in java, php, css, javascript, etc.

But my question is WHY is it that way? Why, by design, it is not allowed? I mention "by design" because I really doubt it is because of parsing problems, I guess the parsers are perfectly capable of keeping track of opening /*s and close the comments with their corresponding closing */s But they simply somehow decided it is not a good idea.

I already know that a workaround for this is to somehow change the inner closing comments, to avoid them to close , and only leave the last closing one. e.g. changing inner -->s and */s for - ->s and * /s . But that is obviously not convenient, and hard to do when you only want to discard blocks of code for debugging purposes. (other techniques are to nest everything in if(false){} blocks, but that is not the point here.

So, what I'd like to know is WHY nested comments are generally not allowed in several modern languages? there must be a good reason other than "others don't do it, we won't either" right?.

And as a plus, are there any other (not so obscure) languages that DO allow nested block comments?

Solution

The reason is historical and has to do with the architecture of compilers.

For the sake of efficiency, most compilers traditionally parse the source code in two stages: the lexical analysis and the actual parsing of a token stream (that was produced by said lexical analysis). The lexical analysis is the part that recognises individual tokens, such as keywords, strings, number literals – and comments.

Again for reasons of efficiency, lexical analysis is traditionally implemented via a finite-state machine. These finite-state machines happen to recognise (= handle) regular languages, which fits perfectly for the above-mentioned tokens. However, it is not able to recognise nested constructs – this would require a more powerful machine (augmented by a stack).

Not allowing nested comments was thus simply a decision that traded off convenience for performance, and subsequent languages have by and large adopted the convention.

And as a plus, are there any other (not so obscure) languages that DO allow nested block comments?

There are some. The comments already mentioned Haskell and Pascal. Other languages are D and F#.