Search code examples
haskellinstancetypeclass

How to specify a typeclass instance?


I have a (fairly) legitimate case where there are two type instance implementations, and I want to specify a default one. After noting that doing modular arithmetic with Int types resulted in lots of hash collisions, I want to try GHC's Int64. I have the following code:

class Hashable64 a where
    hash64 :: a -> Int64

instance Hashable64 a => Hashable a where
    hash = fromInteger . toInteger . hash64

instance Hashable64 a => Hashable64 [a] where
    hash64 = foldl1 (\x y -> x + 22636946317 * y) . map hash64

and an instance Hashable64 Char, which thus results in two implementations for Hashable String, namely:

  • The one defined in Data.Hashable.
  • Noting that it is an Hashable64 instance, then converting to a regular Int for an instance of Data.Hashable.

The second code path may be better because it performs hashing with Int64s. Can I specify to use this derivation of the instance Hashable String?

Edit 1

Sorry, I forgot to add I have already tried the overlapping instances thing; perhaps I'm just not implementing it correctly? The documentation for overlapping instances says it works when one instance is more specific. But when I try to add a specific instance for Hashable String, the situation doesn't improve. Full code at [ http://pastebin.com/9fP6LUX2 ] (sorry for the superfluous default header).

instance Hashable String where
    hash x = hash (hash64 x)

I get

Matching instances:
  instance (Hashable a) => Hashable [a] -- Defined in Data.Hashable
  instance [overlap ok] Hashable String
    -- Defined at Hashable64.hs:70:9-23

Edit 2

Any other solutions to this specific problem are welcome. A good solution might provide insight into this overlapping instances problem.


Solution

  • This sort of situation is handled by GHC's OverlappingInstances extension. Roughly speaking, this extension allows instances to coexist despite the existence of some type(s) to which both could apply. For such types, GHC will select the "most specific" instance, which is a little fuzzy in some cases but usually does what you'd want it to.

    This sort of situation, where you have one or more specialized instances and a single catch-all instance Foo a as the "default", usually works pretty well.

    The main stumbling blocks to be aware of with overlapping instances are:

    • If something forces GHC to select an instance on a polymorphic type that's ambiguous, it will refuse with potentially cryptic compiler errors

    • The context of an instance is ignored until after it's been selected, so don't try to distinguish between instances that way; there are workarounds but they're annoying.

    The latter point would be relevant here if, for example, you have a list of some type that's not an instance of Hashable64; GHC will select the more specific second instance, then fail because of the context, even if the full type (the list, not the element type) is an instance of Hashable64 and thus would work with the first, generic instance.


    Edit: Oh, I see, I misinterpreted the situation slightly, regarding where the instances are coming from. Quoth the GHC User's Guide:

    The willingness to be overlapped or incoherent is a property of the instance declaration itself (...). Neither flag is required in a module that imports and uses the instance declaration.

    (...)

    These rules make it possible for a library author to design a library that relies on overlapping instances without the library client having to know.

    If an instance declaration is compiled without -XOverlappingInstances, then that instance can never be overlapped. This could perhaps be inconvenient. (...) We are interested to receive feedback on these points.

    ...in other words, overlapping is only allowed if the less specific instance was built with OverlappingInstances enabled, which the instance for Hashable [a] was not. So your instance for Hashable a is allowed, but one for Hashable [Char] fails, as observed.

    This is a tidy illustration of why the User's Guide finds the current rules unsatisfactory (other rules would have their own problems, so it's not clear what the best approach, if any, would be).

    Back in the here-and-now, you have multiple options, which are slightly less convenient than what you'd hoped for. Off the top of my head:

    • Alternate class: Define your equivalent of the Hashable class, write the overlapped instances you want, and use generic instances with Hashable in the context to fall back to the original as needed. This is problematic if you're using another library that expects Hashable instances, rather than pre-hashed values or an explicit hash function.

    • Type wrapper: newtype wrappers are something of a "standard" way to disambiguate instances (c.f. Monoid). By using such a wrapper around your values, you'll be able to write whatever instances you please because none of the pre-defined instances will match. This becomes problematic if you have a lot of functions that would need to wrap/unwrap the newtype, though keep in mind that you can define other instances (e.g., Num, Show, etc.) for the wrapper easily and there's no overhead at run time.

    There are other, more arcane, workarounds, but I can't offer too much explicit guidance because which is the least awkward tends to be very situation-dependent.

    It's worth noting that you're definitely pushing the edges of what can be sensibly expressed with type classes, so it's not surprising that things are awkward. It's not a very satisfying situation, but there's little you can do when constrained to adding instances for a class defined elsewhere.