Search code examples
linuxbashfileshelltext

How to split a file and keep the first line in each of the pieces?


Given: One big text-data file (e.g. CSV format) with a 'special' first line (e.g., field names).

Wanted: An equivalent of the coreutils split -l command, but with the additional requirement that the header line from the original file appear at the beginning of each of the resulting pieces.

I am guessing some concoction of split and head will do the trick?


Solution

  • This is robhruska's script cleaned up a bit:

    tail -n +2 file.txt | split -l 4 - split_
    for file in split_*
    do
        head -n 1 file.txt > tmp_file
        cat "$file" >> tmp_file
        mv -f tmp_file "$file"
    done
    

    I removed wc, cut, ls and echo in the places where they're unnecessary. I changed some of the filenames to make them a little more meaningful. I broke it out onto multiple lines only to make it easier to read.

    If you want to get fancy, you could use mktemp or tempfile to create a temporary filename instead of using a hard coded one.

    Edit

    Using GNU split it's possible to do this:

    split_filter () { { head -n 1 file.txt; cat; } > "$FILE"; }; export -f split_filter; tail -n +2 file.txt | split --lines=4 --filter=split_filter - split_
    

    Broken out for readability:

    split_filter () { { head -n 1 file.txt; cat; } > "$FILE"; }
    export -f split_filter
    tail -n +2 file.txt | split --lines=4 --filter=split_filter - split_
    

    When --filter is specified, split runs the command (a function in this case, which must be exported) for each output file and sets the variable FILE, in the command's environment, to the filename.

    A filter script or function could do any manipulation it wanted to the output contents or even the filename. An example of the latter might be to output to a fixed filename in a variable directory: > "$FILE/data.dat" for example.