Search code examples
bashawkset-theory

Bash: Set theory


I have the following tab-delimited table:

    A   B   C   D   E   F   G   H   I   J
ZO1     X1  X2  X3          X4      X5  X6
ZO2 X7  X8  X9  X10     X11 X12 X13 X14 X15
ZO3 X16 X17 X18 X19         X20     X21 X22
ZO4     X23 X24 X25         X26     X27 X28
ZO5     X29 X30                         
ZO6     X31 X32 X33 X34 X35 X36 X37 X38 X39
ZO7 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49
ZO8     X50 X51 X52         X53     X54 X55

(X## is a random string)

And I want to extract the values in column #1, that fulfill a certain condition. An exemplary condition would be: Retrieve all values (column1), that have a non-empty value in the columns B,C,D,G,I,J and empty values in the remaining columns A,E,F,H.

So an example output would be:

Z01
Z04
Z08

EDIT: Sorry for the poor input. Below a semicolon-delimited table; the real input is TAB-delimited

;A;B;C;D;E;F;G;H;I;J
ZO1;;X1;X2;X3;;;X4;;X5;X6
ZO2;X7;X8;X9;X10;;X11;X12;X13;X14;X15
ZO3;X16;X17;X18;X19;;;X20;;X21;X22
ZO4;;X23;X24;X25;;;X26;;X27;X28
ZO5;;X29;X30;;;;;;;
ZO6;;X31;X32;X33;X34;X35;X36;X37;X38;X39
ZO7;X40;X41;X42;X43;X44;X45;X46;X47;X48;X49
ZO8;;X50;X51;X52;;;X53;;X54;X55

Solution

  • I like this one, It'll run if you copy and paste it whole into bash, comments and all.

    tail -n +2 file              `# Grab the bit of the file you car about` \
    |  sed 's/;/|;/'           `# Protect the first column`               \
    |  sed 's/;[^;][^;]*/1/g' `# Change all the filled values to 1`      \
    |  sed 's/;/0/g'            `# Change the empty values to 0`
    

    The output of that command looks like this:

     ZO1|0111001011
     ZO2|1111011111
     ZO3|1111001011
     ZO4|0111001011
     ZO5|0110000000
     ZO6|0111111111
     ZO7|1111111111
     ZO8|0111001011
    

    So now I can set the pattern I'm looking for.

    tail -n +2 file              `# Grab the bit of the file you car about` \
    |  sed 's/;/|;/'           `# Protect the first column`               \
    |  sed 's/;[^;][^;]*/1/g' `# Change all the filled values to 1`      \
    |  sed 's/;/0/g'            `# Change the empty values to 0`           \
    |  grep "|0111001011"        `# Grab the match you want`                \
    |  sed  's/|.*//'            `# Clear out the garbage`
    

    Then Id g eneralize it with a function

    >> function table_match () {
        cat                          `# Grab the stdin`                     \
        |  sed 's/;/|;/'           `# Protect the first column`           \
        |  sed 's/;[^;][^;]*/1/g' `# Change all the filled values to 1`  \
        |  sed 's/;/0/g'            `# Change the empty values to 0`       \
        |  grep "|${1}"              `# Grab the match you want`            \
        |  sed  's/|.*//'            `# Clear out the garbage`;
    }
    
    
    >> tail -n +2 file | table_match 0111001011
    ZO1
    ZO4
    ZO8
    

    I can do other stuff too ... dot wild card ... kleene star ... nifty.

    >> tail -n +2 file | table_match .......011
    ZO1
    ZO2
    ZO3
    ZO4
    ZO5
    ZO6
    ZO7
    ZO8
    
    >> tail -n +2 file | table_match 01*
    ZO1 
    ZO4 
    ZO5 
    ZO6 
    ZO8