I want to build NFAs that match strings. I have:
data State = State Char State | Split State State | Final
Note that State
can be infinitely recursive on purpose, because I want to be build NFAs like the one below:
-- NFA for regex `1*`
match1 = State '1' loop; loop = Split match1 Final
When executing such an NFA against a string I have to keep track of already visited states in order to break cycles. I do this with a Set
which in turn requires State
to implement Ord
. However the derived Ord
instance's compare
of course has the same problem that it keeps recursing.
How would I go about solving this? I've looked at StableName
, but makeStableName
uses the IO
monad and thus cannot be used when implementing compare
(AFAIK).
The only thing I can think of right now is to add some kind of id to a State
for tracking:
data State = State Int Char State | Split Int State State | Final
But that feels rather unelegant especially considering that client code would have to ensure its uniqueness.
TL;DR: don't add State
s to a Set
, but implement instance Hashable State
using StableName State
and add to a HashSet
instead.
As pointed out in the comments this problem is not easily solved without tagging each state with some kind of unique id.
My first attempt was to just add an Int
to the State
and Split
ctor, but this didn't go well with the way the NFA is build. Since this is not done by client code directly, but by a builder (which in turn is used by a parser) which then would have to auto-generate unique ids.
It seemed possible with a state monad, but the problem was that the implemention would keep generating new unique ids for states that have already been "id'ed" due to cycles in the NFA.
What ultimately pushed me in the right direction was the mentioning of Data.Reify in a (now deleted) comment.
Data.Reify relies on StableName
s and a HashMap
to detect cycles so I figured I could do the same by implementing Hashable
for State
with the help of StableName
.
The following is what I ended up with.
Preamble:
import qualified Control.Monad.State.Lazy as M
import Data.Hashable
import qualified Data.HashMap.Lazy as Map (empty, insert, lookup)
import Data.HashMap.Lazy (HashMap)
import System.IO.Unsafe (unsafePerformIO)
import System.Mem.StableName (eqStableName, makeStableName)
The relevant types again (because I changed them by now):
data State = State {accepts :: Match, next :: State}
| Split {left :: State, right :: State}
| Final
data Match = AnyChar
| LiteralChar Char
| CharRange (Char, Char)
deriving (Eq, Ord, Show)
Implementation of Hashable
(requires Eq
);
instance Eq State where
x == y = unsafePerformIO $
M.liftM2 eqStableName (makeStableName x) (makeStableName y)
instance Hashable State where
hashWithSalt salt state = unsafePerformIO $
hashWithSalt salt <$> makeStableName state
I see three possbile issues with this implemention:
State
may have a different StableName
after evaluation. This is known StableName
behavior and non-issue during NFA execution, because in the worst case a Split
will be followed twice when figuring out the next possible states reachable without consuming input.(==)
might return False
for the same State
. I have no clue of the implications as of yet 🤷♂️unsafePerformIO
is in this scenario. But I've seen other code doing something very similar and the Data.Reify docs say something vague about that it should be safe.The implementation of the NFA execution works for now in the presence of loops as does the implementation of instance Show State
.