Search code examples
xmlregexparsingcdatalex

What is the regex expression for CDATA


Hi I have an example CDATA here

<![CDATA[asd[f]]]>

and

<tag1><![CDATA[asd[f]]]></tag1><tag2><![CDATA[asd[f]]]></tag2>

The CDATA regex i have is not able to recognize this

"<![CDATA["([^\]]|"]"[^\]]|"]]"[^>])*"]]>"

this does not work too

"<![CDATA["[^\]]*[\]]{2,}([^\]>][^\]]*[\]]{2,})*">"

Will someone please give me a regex for <![CDATA[asd[f]]]>, I need to use it in Lex/Flex

: I have answered this question, please vote on my answer, thanks.


Solution

  • This is the solution. The reason we need to use a START STATE is so that what ever is between <!CDATA[ and ]]> does not get match against other REGEX.

    %option noyywrap
    %x CDATA
    
    %%
    "<![CDATA[" { BEGIN CDATA; printf("Entering CDATA\n"); }
    <CDATA>([^\]]|\n)*|.    { printf("In CDATA: %s\n", yytext); }
    <CDATA>"]]>" { 
        printf("End of CDATA\n");
        BEGIN INITIAL;
    }
    
    %%
    main()
    {
        yylex();
    }