Search code examples
awknawk

Splitting of files based on a criteria


I have a file with below data

.domain bag
.set bag1
bag1
abc1
.set bag2
bag2
abc2
.domain cat
.set bag1:cat
bag1:cat
abc1:cat
.set bag2:cat
bag2:cat
abc2:cat

I want to split this file into two (bag1.txt and bag2.txt) based on the set value.

bag1.txt should look like :

.domain bag
.set bag1
bag1
abc1
.domain cat
.set bag1:cat
bag1:cat
abc1:cat

bag2.txt should look like :

.domain bag
.set bag2
bag2
abc2
.domain cat
.set bag2:cat
bag2:cat
abc2:cat

the .domain line is common for both the files.

I tried the command below but it is not working.

nawk '{if($0~/.set/){split($2,a,":");filename=a[1]".text"}if(filename=".text"){print|"tee *.text"}else{print >filename}}' file.txt

Solution

  • One way:

    awk '
        BEGIN {
            ## Split fields with spaces and colon.
            FS = "[ :]+";
    
            ## Extension of output files.
            ext = ".txt";
        }
    
        ## Write lines that begin with ".domain" to all known output files (saved
        ## in "processed_bags"). Also save them in the "domain" array to copy them
        ## later to all files not processed yet.
        $1 == ".domain" {
    
            for ( b in processed_bags ) {
                print $0 >> sprintf( "%s%s", b, ext );
            }
    
            domain[ i++ ] = $0;
    
            next;
        }
    
        ## Select output file to write. If not found previously, copy all
        ## domains saved until now.
        $1 == ".set" {
            bag = $2;
            if ( ! (bag in processed_bags) ) {
                for ( j = 0; j < i; j++ ) {
                    print domain[j] >> sprintf( "%s%s", bag, ext );
                }
                processed_bags[ bag ] = 1;            
            }
        }
    
        ## A normal line of data (neither ".domain" nor ".set"). Copy
        ## to the file saved in "bag" variable.
        bag {
            print $0 >> sprintf( "%s%s", bag, ext );
        }
    ' file.txt
    

    Run following command to check output:

    head bag[12].txt
    

    Output:

    ==> bag1.txt <==                                                                                                                                                                                                                             
    .domain bag                                                                                                                                                                                                                                  
    .set bag1                                                                                                                                                                                                                                    
    bag1                                                                                                                                                                                                                                         
    abc1                                                                                                                                                                                                                                         
    .domain cat                                                                                                                                                                                                                                  
    .set bag1:cat                                                                                                                                                                                                                                
    bag1:cat
    abc1:cat
    
    ==> bag2.txt <==
    .domain bag
    .set bag2
    bag2
    abc2
    .domain cat
    .set bag2:cat
    bag2:cat
    abc2:cat