Search code examples
antlrgrammarhosts-file

Hosts file ANTLR grammar


Is there an existing, working hosts file grammar on the web?

I checked out list on http://www.antlr.org/grammar/list, but I didn't find it there.

I also checked the hosts file entry in Wikipedia, and it referenced RFC 952, but I don't think that is the same format used by /windows/system32/drivers/etc/hosts.

Any grammar format is better than none, but I would prefer one in ANTLR format. This is the first time I've used any grammar generators, and I want to keep my learning curve low. I'm already planning to use ANTLR for consuming other files.


Solution

  • From a Microsoft page:

    The HOSTS file format is the same as the format for host tables in the Version 4.3 Berkeley Software Distribution (BSD) UNIX /etc/hosts file.

    And the /etc/hosts file is described here.

    An example file:

    #
    # Table of IP addresses and hostnames
    #
    172.16.12.2     peanut.nuts.com peanut
    127.0.0.1       localhost
    172.16.12.1     almond.nuts.com almond loghost
    172.16.12.4     walnut.nuts.com walnut
    172.16.12.3     pecan.nuts.com pecan
    172.16.1.2      filbert.nuts.com filbert
    172.16.6.4      salt.plant.nuts.com salt.plant salt
    

    A hosts file looks to be formatted like this:

    • each table entry in /etc/hosts contains an IP address separated by whitespace(s) from a list of hostnames associated with that address
    • a table entry can optionally end with zero or more alias
    • comments begin with #

    The bold words will be the rules in the ANTLR grammar, which may look like this:

    grammar Hosts;
    
    parse
      :  tableEntry* EOF
      ;
    
    tableEntry
      :  address hostName aliases?
         {
           System.out.println("\n== Entry ==");
           System.out.println("  address  : " + $address.text);
           System.out.println("  hostName : " + $hostName.text);
           System.out.println("  aliases  : " + $aliases.text);
         }
      ;
    
    address
      :  Octet '.' Octet '.' Octet '.' Octet
      ;
    
    hostName
      :  Name
      ;
    
    aliases
      :  Name+
      ;
    
    Name
      :  Letter+ ('.' Letter+)*
      ;
    
    Comment
      :  '#' ~('\r' | '\n')* {$channel=HIDDEN;}
      ;
    
    Space
      :  (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}
      ;
    
    Octet
      :  Digit Digit Digit
      |  Digit Digit
      |  Digit
      ;
    
    fragment Letter
      :  'a'..'z'
      |  'A'..'Z'
      ;
    
    fragment Digit
      :  '0'..'9'
      ;
    

    which can be tested with the class:

    import org.antlr.runtime.*;
    
    public class Main {
      public static void main(String[] args) throws Exception {
        String source = 
            "#                                                   \n" +
            "# Table of IP addresses and Hostnames               \n" +
            "#                                                   \n" +
            "172.16.12.2     peanut.nuts.com peanut              \n" +
            "127.0.0.1       localhost                           \n" +
            "172.16.12.1     almond.nuts.com almond loghost      \n" +
            "172.16.12.4     walnut.nuts.com walnut              \n" +
            "172.16.12.3     pecan.nuts.com pecan                \n" +
            "172.16.1.2      filbert.nuts.com filbert            \n" +
            "172.16.6.4      salt.plant.nuts.com salt.plant salt   ";
        ANTLRStringStream in = new ANTLRStringStream(source);
        HostsLexer lexer = new HostsLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        HostsParser parser = new HostsParser(tokens);
        parser.parse();
      }
    }
    

    and will produce the following output:

    bart@hades:~/Programming/ANTLR/Demos/Hosts$ java -cp antlr-3.3.jar org.antlr.Tool Hosts.g
    bart@hades:~/Programming/ANTLR/Demos/Hosts$ javac -cp antlr-3.3.jar *.java
    bart@hades:~/Programming/ANTLR/Demos/Hosts$ java -cp .:antlr-3.3.jar Main
    
    == Entry ==
      address  : 172.16.12.2
      hostName : peanut.nuts.com
      aliases  : peanut
    
    == Entry ==
      address  : 127.0.0.1
      hostName : localhost
      aliases  : null
    
    == Entry ==
      address  : 172.16.12.1
      hostName : almond.nuts.com
      aliases  : almond loghost
    
    == Entry ==
          address  : 172.16.12.4
      hostName : walnut.nuts.com
      aliases  : walnut
    
    == Entry ==
      address  : 172.16.12.3
      hostName : pecan.nuts.com
      aliases  : pecan
    
    == Entry ==
      address  : 172.16.1.2
      hostName : filbert.nuts.com
      aliases  : filbert
    
    == Entry ==
      address  : 172.16.6.4
      hostName : salt.plant.nuts.com
      aliases  : salt.plant salt
    

    Note that this is just a quick demo: host names can contain other characters than the ones I described, to name just one shortcoming.