Search code examples
haskellpattern-matchingoverlapping-matches

Meaning of overlapping pattern in Haskell


My current understanding of pattern overlapping in Haskell is that 2 patterns are considered to be overlapping if some argument values passed to a function can be matched by multiple patterns.

Given:

last :: [a] -> a
last [x] = x
last (_ : xs) = last xs

passing the argument value [1] would match both the first pattern [x] and the 2nd pattern (_ : xs) - so that would mean the function has overlapping patterns even though both patterns can be matched.

What makes this confusing is that although the patterns are (by the definition above) overlapping, GHC does not show any warning about them being overlapping.

Reverting the 2 pattern matches in the last function does show the overlapping warning:

last :: [a] -> a
last (_ : xs) = last xs
last [x] = x

Warning:

src\OverlappingPatterns.hs:6:1: Warning:
    Pattern match(es) are overlapped
    In an equation for `last': last [x] = ...

It is almost as though GHC consideres the patterns overlapping if a previous pattern makes it impossible to match a pattern which occurs later.

What is the correct way to determine if a function has overlapping patterns or not?


Update

I am looking for the overlapping pattern definition used in fp101x course.

According to the definition used in fp101x the following function has overlapping patterns:

last :: [a] -> a
last [x] = x
last (_ : xs) = last xs

This is in contradiction with GHC definition of overlapping pattern which does not consider it to have any overlapping patterns.

Without a proper definition of what overlapping pattern means in the fp101x course context, it is impossible to solve that exercise. And the definition used there is not the GHC one.


Solution

  • The updated question clarifies the OP wants a formal definition of overlapping patterns. Here "overlapping" is meant in the sense used by GHC when it emits its warnings: that is, when it detects that a case branch is unreachable because its pattern does not match with anything which is not already handled by earlier branch.

    A possible formal definition can indeed follow that intuition. That is, for any pattern p one can first define the set of values (denotations) [[p]] matching with p. (For this, it is important to know the type of the variables involved in p -- [[p]] depends on a type environment Gamma.) Then, one can say that in the sequence of patterns

    q0 q1 ... qn p
    

    the pattern p is overlapping iff [[p]], as a set, is included in [[q0]] union ... union [[qn]].

    The above definition is hardly operative, though -- it does not immediately lead to an algorithm for checking overlaps. Indeed, computing [[p]] is unfeasible since it is an infinite set, in general.

    To define an algorithm, I'd try to define a representation for the set of terms "not yet matched" by any pattern q0 .. qn. As an example, suppose we work with lists of booleans:

    Remaining: _   (that is, any list)
    
    q0 = []
    Remaining: _:_  (any non empty list)
    
    q1 = (True:xs)
    Remaining: False:_
    
    p = (True:False:ys)
    Remaining: False:_
    

    Here, the "remaining" set did not change, so the last pattern is overlapping.

    As another example:

    Remaining: _
    
    q0 = True:[]
    Remaining: [] , False:_ , True:_:_
    
    q1 = False:xs
    Remaining: [], True:_:_
    
    q2 = True:False:xs
    Remaining: [], True:True:_
    
    q3 = []
    Remaining: True:True:_
    
    p = True:xs
    Remaining: nothing  -- not overlapping (and exhaustive as well!)
    

    As you can see, in each step we match each of the "remaining" samples with the pattern at hand. This generates a new set of remaining samples (possibly none). The collection of all these samples forms the new remaining set.

    For this, note that it is important to know the list of constructors for each type. This is because when matching with True, you must know there's another False case remaining. Similarly, if you match against [], there's another _:_ case remaining. Roughly, when matching against constructor K, all other constructors of the same type remain.

    The above examples are not yet an algorithm, but they can get you started, hopefully.

    All of this of course ignores case guards (which make the overlap undecidable), pattern guards, GADTs (which can further refine the remaining set in quite subtle ways).