Differences between data/type constructors and functions?

Could anyone explain to me what are the differences between data/type constructors and functions? Haskell mix them and give us a universal interface (all looks like functions, in particular, we can partially apply them), while the ML family languages distinguish them.

Solution

Your dichotomy "Haskell vs. the ML world" is wrong¹. SML also promotes constructors to functions, and Caml Light used to. I'm not sure why it was removed in OCaml, I think the designer thought there wasn't a real need for it.

There are deep reasons why constructors are slightly particular. They can be used in patterns, while general functions cannot -- I guess this can be explained in term of polarized logic and focusing². Also, a constructor applied to values can be considered a value, while this is not true for general functions. It is common in call-by-value languages to restrict recursive values definition to a subclass that allows constructor application, but not general function application.

I agree that the "semantic sugar" of promoting all constructors to functions is nice. I think however that this would not be so useful if we had a nice syntactic sugar for short abstraction, such as Scala's Some(_). Whether constructors should be curryfied (your "partial application" remark) or not is a different and orthogonal question, in my opinion.

¹: besides being a wrong dichotomy, the tone of your question has a certain taste of "Haskellers and MLers, here is the ring, please fight !". It may not be intentional, but anyway you should probably avoid such formulations. Language design is made of compromises, and assuming when two different languages made different choices that one of them is right and not the other is not a good approach.

PS: the question has since been edited and is now much more neutral. Thanks for editing.

²: as requested by petebu, here is a little more (actually a lot more) information on "focusing and polarity". I would like to point out that I'm really not an expert on this topic (hence the "I guess").

I would recommend first the paper Focusing on Pattern Matching (PDF) by Neelakantan Krishnaswami, 2009. It contains an introduction for people not proficient in polarised logic and focusing (but you need to be at least familiar with sequent calculus). As usual in sequent calculus, types/propositions are introduced "on the right" and eliminated/deconstructed "on the left". But here we separate types by "polarity", product and sums being positive and functions negative (my intuition is that product/sums are data while functions are computations, but the separation is motivated by very natural considerations of their behaviour in sequent calculus), and the paper shows that left-elimination of arrow types corresponds to function application, while left-elimination of sum/product types corresponds to pattern matching. There is a case construct that behaves similarly to pattern matching (with some differences), that eliminates positives, so could not be applied to functions.

The other important reference would be Focusing on Binding and Computation, by Dan Licata, Noam Zeilberger and Robert Harper, 2008 (note that if you're planning to submit a paper on these topics, "Focusing on ..." is becoming a bit cliché). It emphasizes much less the link with ML-style pattern matching, but introduces the very nice idea that while computational arrows are "negative on the left" (easily seen as the classical equivalence A→B ≡ ¬A∨B), one may introduce a different arrow, "positive on the left", therefore positively polarized, that can be pattern-matched. It turns out that this arrow is a good match for representing variable bindings (if you're familiar with Higher Order Abstract Syntax, the idea is that the left polarity rules out "exotic terms" that try to compute on their variables), so that terms with variables bindings are data structure that can be pattern-matched like sums or products.
I found their paper a bit hard to read, so I would start with Dan Licata's slides. Finally, Robert Harper has made other slides that provides a different intuition in termes of inductive judgements/derivations : the positive arrows represent derivability (could you build a derivation of C under the hypothesis A and B?) while negative arrows represent admissibility (given derivations of A and B, how would you rewrite/explore/manipulate them to build a derivation of C ?). Very interesting stuff.