'type family' vs 'data family', in brief?

I'm confused about how to choose between data family and type family. The wiki page on TypeFamilies goes into a lot of detail. Occasionally it informally refers to Haskell's data family as a "type family" in prose, but of course there is also type family in Haskell.

There is a simple example that shows where two versions of code are shown, differing only on whether a data family or a type family is being declared:

-- BAD: f is too ambiguous, due to non-injectivity
-- type family F a
 
-- OK
data family F a 

f :: F a -> F a
f = undefined
 
g :: F Int -> F Int
g x = f x

type and data here have the same meaning, but the type family version fails to type-check, while the data family version is fine, because data family "creates new types and are therefore injective" (says the wiki page).

My takeaway from all of this is "try data family for simple cases, and, if it isn't powerful enough, then try type family". Which is fine, but I'd like to understand it better. Is there a Venn diagram or a decision tree I can follow to distinguish when to use which?

Solution

I don't think any decision tree or Venn diagram will exist because the applications for type and data families are pretty wide.

Generally you've already highlighted the key design differences and I would agree with your takeaway to first see if you can get away with data family.

For me the key point is that each instance of a data family creates a new type, which does substantially limit the power because you can't do what is often the most natural thing and make an existing type be the instance.

For example the GMapKey example on the Haskell wiki page on "indexed types" is a reasonably natural fit for data families:

class GMapKey k where
  data GMap k :: * -> *
  empty       :: GMap k v
  lookup      :: k -> GMap k v -> Maybe v
  insert      :: k -> v -> GMap k v -> GMap k v

The key type of the map k is the argument to the data family and the actual map type is the result of the data family (GMap k) . As a user of a GMapKey instance you're probably quite happy for the GMap k type to be abstract to you and just manipulate it via the generic map operations in the type class.

In contrast the Collects example on the same wiki page is sort of the opposite:

class Collects ce where
  type Elem ce
  empty  :: ce
  insert :: Elem ce -> ce -> ce
  member :: Elem ce -> ce -> Bool
  toList :: ce -> [Elem ce]

The argument type is the collection and the result type is the element of the collection. In general a user is going to want to operate on those elements directly using the normal operations on that type. For example the collection might be IntSet and the element might be Int. Having the Int be wrapped up in some other type would be quite inconvenient.

Note - these two examples are with type classes and therefore don't need the family keyword as declaring a type inside a type class implies it must be a family. Exactly the same considerations apply as for standalone families though, it's just a question of how the abstraction is organised.