Search code examples
regexcharacter-encodingsedeshell

Why am I getting strange results with sed in eshell?


I was having trouble piping the results of a 'find' to sed. I simplified it to the simplest thing I could break, and I got this:

echo 1234567890abcdefghijklmnopqrstuvwxyz | sed 's/[:digit:]*/X/g'

I expected to get:

Xabcdefghijklmnopqrstuvwxyz

The output I get from this is:

X1X2X3X4X5X6X7X8X9X0XaXbXcXeXfXhXjXkXlXmXnXoXpXqXrXsXuXvXwXxXyXzX

which is not what I was expecting. If I change my regex to:

echo 1234567890abcdefghijklmnopqrstuvwxyz | sed 's/[0-9]*/X/g'

I get:

XaXbXcXdXeXfXgXhXiXjXkXlXmXnXoXpXqXrXsXtXuXvXwXxXyXzX

which is closer to what I expected. I just realized I don't have this problem in a standard terminal, only in aquamacs eshell... I assume it must be a character encoding issue? Maybe unicode related? How do I determine this for sure, and how do I fix this problem?


Solution

  • Remember that the reg-exp char '*' means 'match zero or more of the previous char' ( char class in this case)

    And as @SamHoice noted, you need '[[:digit:]]'.

    So you can either reduced all digits in a row 1 X

    echo 1234567890abcdefghijklmnopqrstuvwxyz | sed 's/[[:digit:]][[:digit:]]*/X/g'
    Xabcdefghijklmnopqrstuvwxyz
    

    Or substitute X for all digits

    echo 1234567890abcdefghijklmnopqrstuvwxyz | sed 's/[[:digit:]]/X/g'
    XXXXXXXXXXabcdefghijklmnopqrstuvwxyz
    

    If neither of these work, please edit your question to include what you need as your output.

    I hope this helps.