I was surprised to find that the following cut command:
for n in {1..10}; do echo "[$(echo ' a b c de f ' | cut -d' ' -f$n)]"; done
returns:
[]
[a]
[]
[]
[]
[b]
[c]
[]
[]
[de]
While I could probably rig up an awk
to get the desired (non-delimiters only) approach - is there a way to use cut
itself in a little more intelligent manner?
I am looking for cut to output:
[a]
[b]
[c]
[de]
[f]
Update. I am getting answers providing alternate ways (not using cut
) to do this. That is not the aim of this post. E.g. another way using awk
is:
echo "[$(echo ' a b c de f ' | awk -F' ' -f3)]"
[c]
cut
is an excellent tool for jobs where the delimiter is a single unchanging character. The parsing of files like /etc/passwd
and /etc/group
are in this category. Consider these lines from /etc/passwd
:
sshd:x:103:65534::/var/run/sshd:/usr/sbin/nologin
messagebus:x:104:106::/var/run/dbus:/bin/false
Note that (1) The separator in these files is always colon, :
, and never varies, and (2) two colons together mean that there is an empty field. This is what cut
was designed for.
By default, the separator that cut
uses is a tab. One can optionally change the separator to be a space. But, there is no way to tell cut
that the separator can be either a tab or a space. There is also no way to tell cut to treat repeated separators as one. Repeated separators are always interpreted as meaning empty fields.
When the separators don't fit the above requirements, cut
is the wrong tool.
When field separators require more flexibility, awk
or shell should be considered. By default, awk accepts any sequence of whitespace as a field separator. This can be customized, even to the point of having a regex for the field separator, by changing the FS
variable. The default for shell is also any sequence of any whitespace and this can be changed to other characters, but not regexes, using the IFS
variable.
As an example, here is an awk solution:
$ echo ' a b c de f ' | awk '{for (i=1;i<=NF;i++) print "["$i"]"}'
[a]
[b]
[c]
[de]
[f]
To transfer a shell variable to awk, it is simplest to use a -v
variable assignment. For example, the following uses -v
to assign the value of the n
shell to an awk variable named m
:
$ for n in {1..5}; do echo ' a b c de f ' | awk -v m=$n '{printf "[%s]\n", $m}'; done
[a]
[b]
[c]
[de]
[f]
Note that the awk code is all in single-quotes. This means that the shell does not mess with it. In the awk code, $m
refers to the value of field number m. $m
has nothing to do with any shell variable or shell substitution.