Search code examples
regexlex

Lex: match ignore space


I have a work to recognize hex number, my problem is how to ignore space, but not allow any character before.
like this:

0x7f6e ---->match,and print"0x7f6e"
    0X2146 ---->match,and print"0X21467"
acns0x8972 ----> not match

my work now:

hex     \s*0[X|x][0-9a-fA-f]{1,4}(^.)*(\n)

{hex}   { ECHO;}
.|\n    {}

and it print:

0x7f6e
    0X2146 

how can i print it without space? like this:

0x7f6e
0X2146 

Solution

  • I got a working version which should do what you expect:

    %{
    #include <ctype.h>
    #include <stdio.h>
    %}
    
    %%
    
    ^[ \t]*0[Xx][0-9a-fA-f]{1,4}(.*)$ {
      /* skip spaces at begin of line */
      const char *bol = yytext;
      while (isspace((unsigned char)*bol)) ++bol;
      /* echo rest of line */
      puts(bol);
    }
    
    .|\n { }
    
    %%
    
    int main(int argc, char **argv) { return yylex(); }
    
    int yywrap() { return 1; }
    

    Notes:

    1. \s seems to be unsupported (at least in my version 2.6.3 of flex). I replaced it by [ \t]. Btw. \s usually matches also carriage return, newline, formfeed what's not intended in my case.

    2. (^.)* replaced by (.*). (I didn't understand the intention of the original one. Mistake?)

    3. I added a ^ at begin of 1st pattern so that pattern is attached to begin of line.

    4. I replaced \n at the end of hex line with $. The puts() function adds a newline to output. (Newlines are always matched by 2nd rule and thus skipped.)

    5. I replaced ECHO; with some C code to (1st) remove spaces at begin of line and (2nd) output the rest of line to standard output channel.

    Compiled and tested in cygwin on Windows 10 (64 bit):

    $ flex --version
    flex 2.6.3
    
    $ flex -o test-hex.c test-hex.l ; gcc -o test-hex test-hex.c
    
    $ echo "
    0x7f6e                                              
        0X2146
    acns0x8972
    " | ./test-hex
    0x7f6e
    0X2146
    
    $
    

    Note: I used echo to feed your sample data via pipe into standard input channel of test-hex.