Search code examples
programming-languagesnlp

Is similarity to "natural language" a convincing selling point for a programming language?


Look, for example at AppleScript (and there are plenty of others, some admittedly quite good) which advertise their use of the natural language metaphor. Code is apparently more readable because it can be/is intended to be constructed in English-like sentences, says they. I'm sure there are people who would like nothing better than to program using only English sentences. However, I have doubts about the viability of a language that takes that paradigm too far (excepting niche cases).

So, after a certain reasonable point, is natural-languaginess a benefit or a misfeature? What if the concept is carried to an extreme -- will code necessarily be more readable? Or might it be unnecessarily long, difficult to work with, and just as capable of producing hilarity on the scale of obfuscated Perl, obfuscated C, and eye-twisting Bash script logorrhea?

I am aware of some specialty cases like "Inform" that are almost pure English, but these have a niche that they're not likely to venture out from. I hear and read about how great it would be for code to read more like English sentences, but are there discussions of the possible disadvantages? If everyday language is so clear, simple, clean, lovely, concise, understandable, why did we invent mathematical notation in the first place?

Is it really easier to describe complex instructions accurately and precisely to a machine in natural language, or isn't something closer to mathematical markup a much better choice? Where should that line be drawn? And finally, are you attracted to languages that are touted as resembling English sentences? Should this whole question have just been a one liner:

naturalLanguage > computerishLanguage ? booAndHiss : cheerLoudly;

Solution

  • My answer to this would be that the ideal programming language lies somewhere between a natural language and a very formal language.

    On the one extreme, there's the formal, minimal, mathematical languages. Take for example Brainfuck:

    ,>++++++[<-------->-],[<+>-]<.    // according to Wikipedia, this means addition 
    

    Or, what's somewhat preferable to the above mess, any type of lambda calculus.

    λfxy.x
    λfxy.y
    

    This is one possible way of expressing the Boolean truth values in lambda calculus. Doesn't look very neat, especially when you build logical operators (such as AND being e.g. λpq.pqp) around them.

    I claim that most people could not write production code in such a minimalistic, hard-to-grasp language.


    The problem on the other end of the spectrum, namely natural languages as they are spoken by humans, is that languages with too much complexity and flexibility allows the programmer to express vague and indefinite things that can mean nothing to today's computers. Let's take this sample program:

    MAYBE IT WILL RAIN CATS AND DOGS LATER ON. WOULD YOU LIKE THIS, DEAR COMPUTER?
    IF SO, PRINT "HELLO" ON THE SCREEN.
    IF YOU HATE RAIN MORE THAN GEORGE DOES, PRINT SOME VAGUE GARBAGE INSTEAD.
    (IN THE LATTER CASE, IT IS UP TO YOU WHERE YOU OUTPUT THAT GARBAGE.)
    

    Now this is an obvious case of vagueness. But sometimes you would get things wrong with more reasonable natural language programs, such as:

    READ AN INTEGER NUMBER FROM THE TERMINAL.
    READ ANOTHER INTEGER NUMBER FROM THE TERMINAL.
    IF IT IS LARGER THAN ZERO, PRINT AN ERROR.
    

    Which number is IT referring to? And what kind of error should be printed (you forgot to specify it.) — You would have to be really careful to be extremely explicit about what you mean.

    It's already too easy to mis-understand other humans. How do you expect a computer to do better?

    Thus, a computer language's syntax and grammar has to be strict enough so that it doesn't allow ambiguity. A statement must evaluate in a deterministic way. (There are maybe corner cases; I'm talking about the general case here.)


    I personally prefer languages with a very limited set of keywords. You can quickly learn such a language, and you don't have to choose between 10,000 ways of achieving one goal simply because there's 10,000 keywords for doing the same thing (as in: GO/WALK/RUN/TROD/SLEEPWALK/etc. TO THE FRIDGE AND GET ME A BEER!). It means if you need to think about 10,000 different ways of doing something, it won't be due to the language, but due to the fact that there are 9,999 stupid ways to do it, and 1 elegant solution that just shines more than all the others.

    Note that I wrote all natural language examples in upper-case. That's because I sort of had good old GW-BASIC and COBOL in mind while I wrote this. There've been some examples of programming languages that lean on natural language, and I think history has shown that they are, in general, somewhat less widespread than e.g. terse C-style languages.