Search code examples
c#regexemacscompilationcompilationmode

Writing compilation mode regexps


The final working solution that takes into account line and column ranges:

(csharp
 "^ *\\(?:[0-9]+>\\)*\\(\\(?:[a-zA-Z]:\\)?[^:(\t\n]+\\)(\\([0-9]+\\),\\([0-9]+\\),\\([0-9]+\\),\\([0-9]+\\)) *\: \\(error\\|warning\\) *CS[0-9]+:)"
 1 (2 . 4) (3 . 5) )

Both answers below were incredibly helpful; I understand the system a lot better now.


Summary: my regexps work to match the output strings, but don't work in the compilation-error-regexp-alist-alist to match errors in my compilation output.

I'm finding the compilation mode regexps a bit confusing. I've written a regex that I know works on my error string using rebuilder and the original regexes that are in compile.el.

40>f:\Projects\dev\source\Helper.cs(37,22,37,45): error CS1061: 'foo.bar' does not contain a definition for 'function' and no extension method 'method' accepting a first argument of type 'foo.bar' could be found (are you missing a using directive or an assembly reference?)

And here's my regexp:

(pushnew '(csharp
 "^ *\\(?:[0-9]+>\\)*\\(\\(?:[a-zA-Z]:\\)?[^:(\t\n]+\\)(\\([0-9]+\\),\\([0-9]+\\),[0-9]+,[0-9]+) *\: \\(?:error *CS[0-9]+:\\)"
 2 3)
     compilation-error-regexp-alist-alist)

Obviously, I'm just trying to get to the first line/column pair that's output. (I'm surprised that the compiler is outputting 4 numbers instead of two, but whatever.)

If we look at the edg-1 regexp in compile.el:

    (edg-1
 "^\\([^ \n]+\\)(\\([0-9]+\\)): \\(?:error\\|warnin\\(g\\)\\|remar\\(k\\)\\)"
 1 2 nil (3 . 4))

So I guess where I'm confused is to how the arguments are passed. In edg-1, where are 3 and 4 coming from? I guess they don't correspond to the capture groups? If I run the edg-1 regexp through re-builder on a well-formed error message and enter subexpression mode, 0 matches the whole matching string, 1 matches the file name and path, and 2 matches the line number. From looking at the documentation (when I do M-x describe-variable), it appears as though it just cares about what place the subexpressions are in the main expression. Either way, I'm clearly misunderstanding something.

I've also tried modifying the official csharp.el regexp to handle the extra two numbers, but with no luck.

(Edit, fixed the example slightly, updated the csharp regexp)


Solution

  • Found some info on this.

    This page has a simplified explanation:
    http://praveen.kumar.in/2011/03/09/making-gnu-emacs-detect-custom-error-messages-a-maven-example/

    Quote from page -

    "Each elt has the form (REGEXP FILE [LINE COLUMN TYPE HYPERLINK
    HIGHLIGHT...]).  If REGEXP matches, the FILE'th subexpression
    gives the file name, and the LINE'th subexpression gives the line
    number.  The COLUMN'th subexpression gives the column number on
    that line"
    

    So it looks like the format is something like this:

    (REGEXP FILE [LINE COLUMN TYPE HYPERLINK HIGHLIGHT...])
    

    Looking at the regex again, it looks like a modified BRE.

     ^                   # BOS
     \( [^ \n]+ \)       # Group 1
    
     (                   # Literal '('
     \( [0-9]+ \)        # Group 2
     )                   # Literal ')'
    
     : [ ] 
    
     \(?:
          error
       \|
          warnin\(g\)    # Group 3
       \|
          remar\(k\)     # Group 4
     \)
    

    Here is the edg-1

    (edg-1
     "^\\([^ \n]+\\)(\\([0-9]+\\)): \\(?:error\\|warnin\\(g\\)\\|remar\\(k\\)\\)"
     1 2 nil (3 . 4))
    

    Where

    "^\\([^ \n]+\\)(\\([0-9]+\\)): \\(?:error\\|warnin\\(g\\)\\|remar\\(k\\)\\)"
    REGEXP ^^^^^^^^
    
     1     2    nil    (3 . 4)
     ^     ^     ^      ^^^^^
    FILE LINE  COLUMN   TYPE
    

    "TYPE is 2 or nil for a real error or 1 for warning or 0 for info.
    TYPE can also be of the form (WARNING . INFO).  In that case this
    will be equivalent to 1 if the WARNING'th subexpression matched
    or else equivalent to 0 if the INFO'th subexpression matched."
    

    So, TYPE is of this form (WARNING . INFO)

    In the regex,
    if capture group 3 matched (ie. warnin\(g\) ) it is equivalent to a warning.
    If capture group 4 matched (ie. remar\(k\) ) it is equivalent to info.  
    One of these will match.  
    

    csharp element info

    Looking at your csharp element

    "^ *\\(?:[0-9]+>\\)?\\(\\(?:[a-zA-Z]:\\)?[^:(\t\n]+\\)(\\([0-9]+\\),\\([0-9]+\\),[0-9]+,[0-9]+) *\: \\(?:error *CS[0-9]+:\\)"
    2 3 4
    

    And your regex (below) actually doesn't have capture group 4 in it.
    So, your FILE LINE COLUMN of 2 3 4
    probably should be 1 2 3

    Here is your regex as its engine see's it -

     ^ 
     [ ]* 
     \(?:
          [0-9]+ > 
     \)?
     \(                            # Group 1
          \(?:
                [a-zA-Z] : 
          \)?
          [^:(\t\n]+ 
     \)
     (                             # Literal '('
          \( [0-9]+ \)                # Group 2
          ,
          \( [0-9]+ \)                # Group 3
          ,
          [0-9]+
          ,
          [0-9]+ 
     )                             # Literal ')'
     [ ]* \: [ ] 
     \(?:
          error [ ]* CS [0-9]+ :
     \)