Search code examples
regexlinuxbashshellglob

foo[E1,E2,...]* glob matches desired contents, but foo[E1,E2,...]_* does not?


I saw something weird today in the behaviour of the Bash Shell when globbing.

So I ran an ls command with the following Glob:

ls GM12878_Hs_InSitu_MboI_r[E1,E2,F,G1,G2,H]* | grep ":"

the result was as expected

GM12878_Hs_InSitu_MboI_rE1_TagDirectory:
GM12878_Hs_InSitu_MboI_rE2_TagDirectory:
GM12878_Hs_InSitu_MboI_rF_TagDirectory:
GM12878_Hs_InSitu_MboI_rG1_TagDirectory:
GM12878_Hs_InSitu_MboI_rG2_TagDirectory:
GM12878_Hs_InSitu_MboI_rH_TagDirectory:

however when I change the same regex by introducing an underscore to this

ls GM12878_Hs_InSitu_MboI_r[E1,E2,F,G1,G2,H]_* | grep ":"

my expected result is the complete set as shown above, however what I get is a subset:

GM12878_Hs_InSitu_MboI_rF_TagDirectory:
GM12878_Hs_InSitu_MboI_rH_TagDirectory:

Can someone explain what's wrong in my logic when I introduce an underscore sign before the asterisk?

I am using Bash.


Solution

  • You misunderstand what your glob is doing.

    You were expecting this:

    GM12878_Hs_InSitu_MboI_r[E1,E2,F,G1,G2,H]*
    

    to be a glob of files that have any of those comma-separated segments but that's not what [] globbing does. [] globbing is a character class expansion.

    Compare:

    $ echo GM12878_Hs_InSitu_MboI_r[E1,E2,F,G1,G2,H]
    GM12878_Hs_InSitu_MboI_r[E1,E2,F,G1,G2,H]
    

    to what you were trying to get (which is brace {} expansion):

    $ echo GM12878_Hs_InSitu_MboI_r{E1,E2,F,G1,G2,H}
    GM12878_Hs_InSitu_MboI_rE1 GM12878_Hs_InSitu_MboI_rE2 GM12878_Hs_InSitu_MboI_rF GM12878_Hs_InSitu_MboI_rG1 GM12878_Hs_InSitu_MboI_rG2 GM12878_Hs_InSitu_MboI_rH
    

    You wanted that latter expansion.

    Your expansion uses a character class which matches the character E-H, 1-2, and ,; it's identical to:

    GM12878_Hs_InSitu_MboI_r[EFGH12,]_*
    

    which, as I expect you can now see, isn't going to match any two character entries (where the underscore-less version will).