What is the reason that POSIX expression such as [:space:] needs to be in another [ ] ?
$ echo "a b c" | sed 's/[:space:]*/_/g'
_ _b_ _
$ echo "a b c" | sed 's/[[:space:]]*/_/g'
_a_b_c_
$ echo "a b c" | sed 's/[[:space:]][[:space:]]*/_/g'
a_b_c
Regular Expressions/POSIX Basic Regular Expressions
Character classes
The POSIX standard defines some classes or categories of characters as shown below. These classes are used within brackets.
I had not understood what the character classes was but assumed it was a special character matching any white spaces, hence believed 's/[:space:]/_g/' would match space in-between "a b", however I suppose '[:space:]' itself would not match any character (please correct if this is still wrong).
I suppose [:space:] is like '\t\n\r\f\v' but by itself has no function. With blacket '[[:space:]]', it then has the function same as '[\t\n\r\f\v]'.
You need to understand the terminology:
A bracket expression is a set of characters enclosed in [
and ]
and can be used as such in a regexp. That set of characters can be represented by any combination of any of the following (and an optional initial ^
negation character):
abcd...z
, ora-z
, or[:lower:]
So [:space:]
is a character class (representing all white space chars) and that can be used within a bracket expression [...]
in a regexp just like if you specifically listed all white space chars within the bracket expression [...]
. So this:
[:space:]
is just a character class, while this:
[[:space:]]
is a bracket expression which includes all white space chars and this:
[[:space:][:lower:]_#;A-D]
is a bracket expression which includes tall white space chars plus all lower case letters plus the chars _, #, and ; plus the letters in the range A through D (whatever those chars are in your locale).