programming-languages language-features stata

Stata programming language without syntax?

I recently got into Stata coming from a procedural/OO/functional background, and am having trouble understanding the basic elements of the language.

For example, I discovered that there is a syntax command which "allows programs to interpret the arguments the user types according to a grammar, such as standard Stata syntax". I infer this is the reason why some command require a list of variables given as arguments to be separated by whitespaces while others require a comma-separated list. But the idea of a program defining its own syntax instead of the (parameter) syntax being enforced seems plain weird.

Another quite interesting construct is the syntax for macro definition and expansion (`macro') and the apparent absence of local variables as known in other languages.

Is there something like a "Stata for Java developers" document explaining the basic concepts of the language to people with my background?

PS: Apologies if this question seems unclear. Unfortunately, I can't formulate more concrete/clear questions at this point :(

Solution

I'm not exactly sure what you are looking for... but here's a few related points. Stata is kind of like writing a Unix shell script or a Windows batch file. Each line executes a command, and the first word is the command name. By convention, most commands have the following structure:

command [varlist] [=exp] [if expression] [in range] [weight] [using filename] [, options]

Brackets [.] means it's optional (or unavailable, depending on the command). Some commands can be prefixed (such as by:, xi:, or svy:) The syntax of commands by Stata Corp and experienced users are pretty consistent. But, because Stata users also write commands, you occasionally see things that are wacky.

When Stata users write commands, they are saved in .ado files (not .do) and are defined using the program command. (See help program and the "Ado files" section of the manual.) Writing a command is akin to writing a function in other languages (e.g., MatLab)

The syntax command is used to help you write your own command. When you execute a command, everything following the command's name (command above) is passed to the program in the local macro `0'. The syntax command parses this local macro, so that you can reference `varlist' or `if' and so on. In theory, you could parse `0' yourself, but the syntax command makes it much easier for you and your users (as long as you are following the conventional syntax). I put an example at the bottom.

I don't know exactly what you mean by "apparent absence of local variables as known in other languages." Macros store a single string or a single number in memory. Here's a comment I wrote about Stata's local/global macros. They are indeed a unique feature of Stata's programming language. As their names imply, "local" macros are only available within a specify program (command) or .do file while "global" macros are available throughout a Stata session.

I found that, once I got used to macros in Stata, I started to miss them in other languages. They are pretty handy. In addition to (local/global) macros and the main data set, you can also store "things" in memory with the scalar and matrix commands (and one or two other obscure things).

I hope that helps. Here's a list resources that might help.

Example:

program define myprogram
    syntax varlist [if], [hello(string) yes]
    macro list _0 _varlist _if _hello _yes 
    summarize `varlist' `if'
    display "Here's the string in my hello option: `hello'"
    if !missing("`yes'") di "Yes is on"
    else                 di "Yes is off"
end 

sysuse auto.dta    
myprogram rep78 headroom if price > 5000 , hello("world") yes