Search code examples
regexsedgrepemoticons

Filtering out emoticons using sed


I have a grep expression using cygwin grep on Win.

grep -a "\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u" all_fbs.txt > rockon_fbs.txt

Once I identify the emoticon class, however, I want to strip them out of the data. However, the same regexp above within a sed results in a syntax error (yes, I realize I could use /d instead of //g, but this doesn't make a difference, I still get the error.)

sed "s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g"

The full line is:

grep -a "\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u" all_fbs.txt | sed "s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g" | sed "s/^/ROCKON\t/" > rockon_fbs.txt

The result is:

sed: -e expression #1, char 14: unknown option to `s'

I know it's coming from the sed regexp I'm asking about it b/c if I remove that portion of the full line, then I get no error (but, of course, the emoticons are not filtered out).

Thanks in advance,

Steve


Solution

  • You need to escape / otherwise it will prematurely terminate the expression.

    s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g
            ^     ^     ^      ^   ^
              These need escaping.
    

    You should also use single-quoted strings instead of double-quoted strings to prevent the backslashes being interpreted by the shell:

    $ echo "\\,"
    \,
    $ echo '\\,'
    \\,
    

    So try this:

    $ echo 'foo \m/ bar \,,/ baz' | sed 's/\(\\,,\/\|\\m\/\|\\m\/\\>\.<\/\\m\/\|:u\)*//g'
    foo  bar  baz