Search code examples
javaregexdroolssyntax-checking

Java based Syntax Check / Rule based toolset


We have been given a project in which we have to accept a set of large text files with very specific requirements, ~150-200 rules. Each rule can pass, fail, not applicable. The pass fail can be the existence or lack of a matching regex. Some rules would be multi-line (i.e if "X" exists, then the following three lines should also exist and they should contain 1, 2, and 3).

Although the entire thing can be written with very hard to read regex code .. and that with each rule the entire file has to be re-read again, I figured I would ask the community if there is another choice?

I have looked at openrules, drools, etc.. and none of them would be able to make it any easier than just writing a huge set of regexes in a list and applying each one to the text file.


Solution

  • I don't see any way you can entirely avoid writing regular expressions and apply them to the lines of those text files. (There's no indication as to an overall grammar defining the configuration file data. Writing a parser according to that grammar would - probably - be a cinch. No chance?)

    I see two problems you have to solve. One is the recognition of certain keywords (such as 'hostname'), the other one is the presence or absence of certain patterns depending on one or more previous lines.

    To solve the first problem, I would (use Java code to) break lines into space-separated tokens, so that each line becomes List.

    The second problem can be attacked using rules.

    rule "hostname"
    when
      Line( $n: number, $tok: tokens contains "hostname" )
      eval( $tok.get( $tok.indexOf( "hostname" ) + 1 ).length() > 4 ) // incomplete
    then
      insert( new Correct( $n, "hostname" ) );
    end
    

    (Note that the boolean expression would have to guard against $tok ending with "hostname".) Inserting facts for correct data is easier than writing rule for all failing situations. At the end there will be another set of rules that check that all required Correct facts are in the Working Memory. Also, it may be necessary to check against duplicate "hostname" definition, which can be done easily using the Correct fact.

    Let's look at the other example as well.

    rule "interface"
    when
        Line( $n1: number, $tok: tokens contains "interface" )
        Line( number == $n1 + 1, tokens not contains "disabled" )
        Line( number == $n1 + 2,
           tokens not contains "parameter" ||
           tokens contains "parameter" && $tok.indexOf( "parameter" ) < $tok.size() - 1 )
    then
        insert( new Error( $n1, "interface configuration error" ) );
    end
    

    Could be that $tok.indexOf( "parameter" ) == 1 and $tok.size() == 2 is required, but not knowing the exact nature of those requirements... Here I'm inserting a negative result, also for collecting it at the end, sorted by line numbers, etc.

    A final note: I have the feeling that the wording of these validation requirements is much too hazy, unless you are confident that a more stringent processor is capable of dealing with poor syntax, or the specs are actually tolerating weird phrasing, such as, e.g. "hostname saturn without his rings ;-)" Would this be a correct line? It passes the test according your rule...