Rules of thumb for function arguments ordering in Clojure

What (if any) are the rules for deciding the order of the parameters functions in Clojure core?

Functions like map and filter expect a data structure as the last argument.
Functions like assoc and select-keys expect a data structure as the first argument.
Functions like map and filter expect a function as the first argument.
Functions like update-in expect a function as the last argument.

This can cause pains when using the threading macros (I know I can use as-> ) so what is the reasoning behind these decisions? It would also be nice to know so my functions can conform as closely as possible to those written by the great man.

Solution

Rather than have this be a link-only question, I'll paste a quote of Rich Hickey's response to the Usenet question "Argument order rules of thumb":

One way to think about sequences is that they are read from the left, and fed from the right:

<- [1 2 3 4]

Most of the sequence functions consume and produce sequences. So one way to visualize that is as a chain:

map<- filter<-[1 2 3 4]

and one way to think about many of the seq functions is that they are parameterized in some way:

(map f)<-(filter pred)<-[1 2 3 4]

So, sequence functions take their source(s) last, and any other parameters before them, and partial allows for direct parameterization as above. There is a tradition of this in functional languages and Lisps.

Note that this is not the same as taking the primary operand last. Some sequence functions have more than one source (concat, interleave). When sequence functions are variadic, it is usually in their sources.

I don't think variable arg lists should be a criteria for where the primary operand goes. Yes, they must come last, but as the evolution of assoc/dissoc shows, sometimes variable args are added later.

Ditto partial. Every library eventually ends up with a more order- independent partial binding method. For Clojure, it's #().

What then is the general rule?

Primary collection operands come first.That way one can write -> and its ilk, and their position is independent of whether or not they have variable arity parameters. There is a tradition of this in OO languages and CL (CL's slot-value, aref, elt - in fact the one that trips me up most often in CL is gethash, which is inconsistent with those).

So, in the end there are 2 rules, but it's not a free-for-all. Sequence functions take their sources last and collection functions take their primary operand (collection) first. Not that there aren't are a few kinks here and there that I need to iron out (e.g. set/ select).

I hope that helps make it seem less spurious,

Rich

Now, how one distinguishes between a "sequence function" and a "collection function" is not obvious to me. Perhaps others can explain this.