Search code examples
haskellrecursive-datastructures

Implement `Eq`/`Ord` for infinitely recursive data type


I want to build NFAs that match strings. I have:

data State = State Char State | Split State State | Final

Note that State can be infinitely recursive on purpose, because I want to be build NFAs like the one below:

-- NFA for regex `1*`
match1 = State '1' loop; loop = Split match1 Final

When executing such an NFA against a string I have to keep track of already visited states in order to break cycles. I do this with a Set which in turn requires State to implement Ord. However the derived Ord instance's compare of course has the same problem that it keeps recursing.

How would I go about solving this? I've looked at StableName, but makeStableName uses the IO monad and thus cannot be used when implementing compare (AFAIK).

The only thing I can think of right now is to add some kind of id to a State for tracking:

data State = State Int Char State | Split Int State State | Final

But that feels rather unelegant especially considering that client code would have to ensure its uniqueness.


Solution

  • TL;DR: don't add States to a Set, but implement instance Hashable State using StableName State and add to a HashSet instead.

    As pointed out in the comments this problem is not easily solved without tagging each state with some kind of unique id.

    My first attempt was to just add an Int to the State and Split ctor, but this didn't go well with the way the NFA is build. Since this is not done by client code directly, but by a builder (which in turn is used by a parser) which then would have to auto-generate unique ids.

    It seemed possible with a state monad, but the problem was that the implemention would keep generating new unique ids for states that have already been "id'ed" due to cycles in the NFA.

    What ultimately pushed me in the right direction was the mentioning of Data.Reify in a (now deleted) comment.

    Data.Reify relies on StableNames and a HashMap to detect cycles so I figured I could do the same by implementing Hashable for State with the help of StableName.

    The following is what I ended up with.

    Preamble:

    import qualified Control.Monad.State.Lazy as M
    import Data.Hashable
    import qualified Data.HashMap.Lazy as Map (empty, insert, lookup)
    import Data.HashMap.Lazy (HashMap)
    import System.IO.Unsafe (unsafePerformIO)
    import System.Mem.StableName (eqStableName, makeStableName)
    

    The relevant types again (because I changed them by now):

    data State = State {accepts :: Match, next :: State}
               | Split {left :: State, right :: State}
               | Final
    
    data Match = AnyChar
               | LiteralChar Char
               | CharRange (Char, Char)
               deriving (Eq, Ord, Show)
    

    Implementation of Hashable (requires Eq);

    instance Eq State where
      x == y = unsafePerformIO $
        M.liftM2 eqStableName (makeStableName x) (makeStableName y)
    
    instance Hashable State where
      hashWithSalt salt state = unsafePerformIO $
        hashWithSalt salt <$> makeStableName state
    

    I see three possbile issues with this implemention:

    1. The same State may have a different StableName after evaluation. This is known StableName behavior and non-issue during NFA execution, because in the worst case a Split will be followed twice when figuring out the next possible states reachable without consuming input.
    2. By the same token (==) might return False for the same State. I have no clue of the implications as of yet 🤷‍♂️
    3. I have no clue how safe unsafePerformIO is in this scenario. But I've seen other code doing something very similar and the Data.Reify docs say something vague about that it should be safe.

    The implementation of the NFA execution works for now in the presence of loops as does the implementation of instance Show State.