Search code examples
regexcomputationnfanon-deterministicepsilon

How do I concatenate adjacent Kleene star symbols from an alphabet?


I came across a situation where I need to convert regular expressions to NFA diagrams from the language {1,0}. Within the regex, I found that there are two concatenated symbols with Kleene stars, 1*0*. Basically this means that the string has any number of 1's followed by any number of 0's.

Whilst converting into an NFA, I got confused mainly because there are two transactions pointing outwards of the first symbol's (1*) accept state: an epsilon transaction back to the initial state (because it has a Kleene star), and an epsilon transaction to the initial state of 0*.

I am not sure whether 1) I can have two transactions leaving the same state when converting to an NFA and if so, 2) how to simplify this transaction.

Any help here would be appreciated!


Solution

  • You can definitely have multiple epsilon transitions from the same state.

    Using https://en.wikipedia.org/wiki/Thompson%27s_construction,

    Concatenation of s and t: the initial state of s is the new initial state, the accepting state of t is the new accepting state. The accepting state of s becomes the initial state of t.

    Kleene closure of s: introduce a new initial state and a new accepting state. Add a epsilon transition from the initial state to the final state. Add an epsilon transition from the new initial to the original initial, and an epsilon transition from original accepting to new accepting, and an epsilon transition from original accepting to original initial.

    So, our expression 1*0* breaks into: 1* concatenated with 0*.

    1 on its own is just q --1--> f. Going through the Kleene conversion to NFA yields

    /--------------e--------------\
    |                             V
    q --e--> q1 --1--> q1f --e--> f
              ^        |
              \---e----/
    

    With a similar construction for 0*. To concatenate them, take the accepting state from the first, and define it to be the starting state of the second:

    /---------------e-------------\  /-------------e---------------\
    |                             V  |                             V
    q --e--> q1a --1--> q1f --e--> q0 --e--> q0a --0--> q0f --e--> f
              ^          |                    ^          |
              \-----e----/                    \-----e----/
    

    To simplify, you can convert it to an NFA or a DFA using their corresponding conversion algorithms.