Search code examples
regexapache-piggrunt-shell

skipping a forward slash in pig grunt shell to match a regular expression on every line


inputData = load '/user/admin/logs/chat_miss' as line:chararray;
filteredData = filter inputData by line matches '([\\d\/]+)\/([\\d:]+)\\s+([\\w\\d]+)\\s+([\\w\\W]+):\\s+([\\w]+)\\W+([\\w]+)\\s+([\\w\/-]+.\\w+)';

Above is my sample two line code where I want to load a file and match each line with this regular expression. I found that every metacharacter needs extra back-slash. But problem is with a special character I want to skip.

This is the error:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 7, column 55>  Unexpected character '/'

This was the sample line I was expecting to match:

13/06/2016/19:15:32 imagecache1 varnishd[8412]:  MISS        :  chat     /cloud/chatContens-139/2111400434/3646261465820934391.jpg

Solution

  • Just found that, every skipping backslash needs also an extra backslash. i.e

    filteredData = filter inputData by line matches '([\\d\\/]+)\\/([\\d:]+)\\s+([\\w\\d]+)\\s+([\\w\\W]+):\\s+([\\w]+)\\W+([\\w]+)\\s+([\\w\\/-]+.\\w+)';