Search code examples
macosunixterminaltailgzip

Combining large number of files in one file using terminal


I have 600 files that I want to combine/merge into one. I've done that using the following command on Mac terminal:

  cat neutral_*.msOut.gz > neutral.msOut

Each file has the following format:

 // Initial random seed:
 1824618124544

 // RunInitializeCallbacks():
 initializeMutationRate(0);
 initializeMutationType(1, 0.5, "f", 0);
 initializeGenomicElementType(1, m1, 1);
 initializeGenomicElement(g1, 0, 1099999);
 initializeRecombinationRate(1e-08);

 // Starting run at generation <start>:
 1 

 #WARNING (Subpopulation::ExecuteMethod_outputXSample): outputMSSample() should probably not be called from an early() event in a WF model; the output will reflect state at the beginning of the generation, not the end.
 #OUT: 1 SM p3 208

 //
 segs: 3
 positions: 0.0012,0.19383,0.18383
 001
 110
 111

When merging these files I only want to include the top 15 lines (that are the same in each file) once in the final merged file. How can this be achieved using Mac terminal?


Solution

  • you need first to unzip before to remove the 15 first lines

    for i in neutral_*.msOut.gz
    do
      zcat $i | head -15 $i > neutral.msOut
      break
    done
    
    for i in neutral_*.msOut.gz
    do
      zcat $i | sed -e 1,15d >> neutral.msOut
    done
    
    • the first loop just extracts one time the first 15 lines in one file to have them one time in the result file, the loop can be simplified knowing the name of one of the files to just extract the first 15 lines of it. If you do not want to have that header in the produced file just remove that loop
    • the second loop adds all except the first 15 lines of each files
    • that does not require to have a given version of tail (see remark in deleted answer of @kabanus saying tail does not have a -q option on osx )
    • may be you need to zip neutral.msOut after the two loops