Search code examples
pythondfa

designing and implementing a DFA in python


I have a language L, which simply consists of strings that are URLs, and I need to design and implement a DFA that recognizes L. (e.g: www.test.com). My issue right now is, once you read in everything up to the 'www.', how would you know when to stop reading for the ".com"?

My code so far:

s = input("Would you like to input a string? y/n")
if(s == 'n'):
    exit
dfa = {'':{'w':'ww'}, 'w': {'w': 'ww'}, 'ww': {'w': 'www'},'www': {'.': 'www.'},"}}
def accepts(transitions,initial,accepting,s):
    state = initial
    for c in s:
        state = transitions[state][c]
    return state in accepting
accepts(dfa,0,{0},"www.hi.com")

Any help is appreciated! (Note that I'm temporarily borrowing a function from here just so I can understand the concepts at play.


Solution

  • A DFA is basically defined by a transition table. This transition table maps each (valid) combination of current state and current input to the corresponding successor state. Such a table can be modelled as a dictionary of dictionaries. For example: The outer dict contains the states as keys and dictionaries as values, those dictionaries in turn each have the valid inputs as keys and the successor state as value.

    EDIT: Your chosen example is not ideal, in such that it has a fairly large alphabet (i.e. all possible input characters) of at least [a-zA-Z0-9], the linked answer limited itself to [01] for a reason ;-) Never the less here is how I would start out:

    {
    # in state '' we have not yet processed/consumed any input
    # it is the start state
    # the only valid input is a 'w'
    '': {'w': 'w'},    
    
    # in state 'w' we a have already consumed a 'w'
    # the only valid input is another 'w'   
    'w': {'w': 'ww'},
    
    # in state 'ww' we have previously consumed 'ww'
    # the only valid input is still only a 'w'  
    'ww': {'w': 'www'},
    
    # now the only valid input is a '.'
    'www': {'.': 'www.'},
    
    # this is where your example becomes unpractical:
    # we have to add a transition for every valid input
    # (you could get around this by using a defaultdict and some kind of special signal value, but im not quite sure you are up to that)
    'www.': {'a': 'www.*', 'b': 'www.*', ...},
    
    # I used the star in this state name to symbolize multiple (at least one) valid character
    # we only leave this state if we encounter a '.' 
    'www.*': {'.': 'www.*.', 'a': 'www.*', 'b': 'www.*', ...},
    
    # it should be obvious how to continue from here 
    'www.*.': ...
    }
    

    EDIT2: Implementation after chat.

    from collections import defaultdict
    
    dfa =  {
      'initial': defaultdict(lambda: 'invalid', w='w'),
      'w': defaultdict(lambda: 'invalid', w='ww'),
      'ww': defaultdict(lambda: 'invalid', w='www'),
      'www': defaultdict(lambda: 'invalid', [('.','www.')]),
      'www.': defaultdict(lambda: 'www.*', [('.','invalid')]),
      'www.*': defaultdict(lambda: 'www.*', [('.','www.*.')]),
      'www.*.': defaultdict(lambda: 'www.*', [('c','www.*.c')]),
      'www.*.c': defaultdict(lambda: 'www.*', [('o','www.*.co')]),
      'www.*.co': defaultdict(lambda: 'www.*', [('m','www.*.com'), ('.','www.*.')]),
      'www.*.com': defaultdict(lambda: 'www.*', [('.','www.*.')]),
      'invalid': defaultdict(lambda: 'invalid')
    }
    def accepts(transitions,initial,accepting,s):
        state = initial
        for c in s:
            state = transitions[state][c]
            print(c, '->', state)
        return state in accepting
    print(accepts(dfa,'initial',{'www.*.com', 'www.*.co'},"www.hi.com"))