Search code examples
parsingantlryacclexragel

How to create a parser which tokenizes a list of words taken from a file?


I am trying to do a syntax text corrector for my compilers' class. The idea is: I have some rules, which are inherent to the language (in my case, Portuguese), like "A valid phrase is SUBJECT VERB ADJECTIVE", as in "Ruby is great".

Ok, so first I have to tokenize the input "Ruby is great". So I have a text file "verbs", with a lot of verbs, one by line. Then I have one text "adjectives", one "pronouns", etc.

I am trying to use Ragel to create a parser, but I don't know how I could do something like:

%%{
  machine test;
  subject = <open-the-subjects-file-and-accept-each-one-of-them>;
  verb = <open-the-verbs-file-and-accept-each-one-of-them>;
  adjective = <open-the-adjective-file-and-accept-each-one-of-them>;
  main = subject verb adjective @ { print "Valid phrase!" } ;
}%%

I looked at ANTLR, Lex/Yacc, Ragel, etc. But couldn't find one that seemed to solve this problem. The only way to do this that I could think of was to preprocess Ragel's input file, so that my program reads the file and writes its contents at the right place. But I don't like this solution either.

Does anyone knows how I could do this? There's no problem if it isn't with Ragel, I just want to solve this problem. I would like to use Ruby or Python, but that's not really necessary either.

Thanks.


Solution

  • If you want to read the files at compile time .. make them be of the format:

    subject = \
    ruby|\
    python|\
    c++
    

    then use ragel's 'include' or 'import' statement (I forget which .. must check the manual) to import it.


    If you want to check the list of subjects at run time, maybe just make ragel read 3 words, then have an action associated with each word. The action can read the file and lookup if the word is good or not at runtime.

    The action reads the text file and compares the word's contents.

    %%{
    machine test
    
    action startWord {
        lastWordStart = p;
    }
    action checkSubject {
       word = input[lastWordStart:p+1]  
       for possible in open('subjects.txt'):
           if possible == word:
               fgoto verb
       # If we get here do whatever ragel does to go to an error or just raise a python exception 
       raise Exception("Invalid subject '%s'" % word)
    }
    action checkVerb { .. exercise for reader .. ;) }
    action checkAdjective { .. put adjective checking code here .. }
    
    subject = ws*.(alnum*)>startWord%checkSubject
    verb := : ws*.(alnum*)>startWord%checkVerb
    adjective := ws*.)alnum*)>startWord%checkAdjective
    main := subject;
    }%%