Search code examples
cattailhead

Concatenate files without last lines of each one


I am concatenating a large number of files into a single one with the following command:

 $ cat num_*.dat > dataset.dat 

However, due to the structure of the files, I'd like to omit concatenating the first two and last two lines of each file. Those lines contain file information which is not important for my necesities.

I know the existence of head and tail, but I don't now how to combine them in a UNIX instruction to solve my issue.


Solution

  • The head command has some odd parameter usage.

    You can use the following to list all of the lines except the last two.

    $ cat num_*.dat | head -n-2  > dataset.dat
    

    Next, take that and run the following tail command on it

    $ tail dataset.dat -n+3 >> dataset.dat
    

    I believe the following will work as one command.

    $ cat num_*.dat | head -n-2 | tail -n+3 > dataset.dat
    

    I tested on a file that had lines like the following:

    Line 1
    Line 2
    Line 3
    Line 4
    Line 5
    Line 6
    Line 7

    This one will get you started:

    cat test.txt | head -n-2 | tail -n+3
    

    From the file above it prints :

    Line 3
    Line 4
    Line 5

    The challenge is that when you use cat filename*.dat or whatever is that the command cats all of the files then runs the command one time so it becomes one large file with only removing the first two lines of the first catted file and the two lines of that last catted file.

    Final Answer - Need to Write a Bash Script

    I wrote a bash script that will do this for you. This one will iterate through each file in your directory and run the command. Notice that it appends (>>) to the dataset.dat file.

    for file in num_*.dat; do
        if [ -f "$file" ]; then
            cat $file | head -n-2 | tail -n+3 >> dataset.dat
            echo "$file"
        fi
    done
    

    I had two files that looked like the following:

    line 1
    line 2
    line 3
    line 4
    line 5
    line 6
    line 7
    2 line 1
    2 line 2
    2 line 3
    2 line 4
    2 line 5
    2 line 6
    2 line 7

    The final output was:

    line 3
    line 4
    line 5
    2 line 3
    2 line 4
    2 line 5