I saw something weird today in the behaviour of the Bash Shell when globbing.
So I ran an ls command with the following Glob:
ls GM12878_Hs_InSitu_MboI_r[E1,E2,F,G1,G2,H]* | grep ":"
the result was as expected
GM12878_Hs_InSitu_MboI_rE1_TagDirectory:
GM12878_Hs_InSitu_MboI_rE2_TagDirectory:
GM12878_Hs_InSitu_MboI_rF_TagDirectory:
GM12878_Hs_InSitu_MboI_rG1_TagDirectory:
GM12878_Hs_InSitu_MboI_rG2_TagDirectory:
GM12878_Hs_InSitu_MboI_rH_TagDirectory:
however when I change the same regex by introducing an underscore to this
ls GM12878_Hs_InSitu_MboI_r[E1,E2,F,G1,G2,H]_* | grep ":"
my expected result is the complete set as shown above, however what I get is a subset:
GM12878_Hs_InSitu_MboI_rF_TagDirectory:
GM12878_Hs_InSitu_MboI_rH_TagDirectory:
Can someone explain what's wrong in my logic when I introduce an underscore sign before the asterisk?
I am using Bash.
You misunderstand what your glob is doing.
You were expecting this:
GM12878_Hs_InSitu_MboI_r[E1,E2,F,G1,G2,H]*
to be a glob of files that have any of those comma-separated segments but that's not what []
globbing does. []
globbing is a character class expansion.
Compare:
$ echo GM12878_Hs_InSitu_MboI_r[E1,E2,F,G1,G2,H]
GM12878_Hs_InSitu_MboI_r[E1,E2,F,G1,G2,H]
to what you were trying to get (which is brace {}
expansion):
$ echo GM12878_Hs_InSitu_MboI_r{E1,E2,F,G1,G2,H}
GM12878_Hs_InSitu_MboI_rE1 GM12878_Hs_InSitu_MboI_rE2 GM12878_Hs_InSitu_MboI_rF GM12878_Hs_InSitu_MboI_rG1 GM12878_Hs_InSitu_MboI_rG2 GM12878_Hs_InSitu_MboI_rH
You wanted that latter expansion.
Your expansion uses a character class which matches the character E-H
, 1-2
, and ,
; it's identical to:
GM12878_Hs_InSitu_MboI_r[EFGH12,]_*
which, as I expect you can now see, isn't going to match any two character entries (where the underscore-less version will).