Search code examples
linuxsplitdu

How do I know the total number of files after splitting in Linux Split


I know that Linux Split can Split large files by file size in the following way, and the result is in the form of a numeric suffix

split -b 1G -d filepath suffix"

# result
suffix01  suffix02 ...

But I would like to be able to get the total split result in it and use it for the split file, say five files, and I would like the result to be as follows

suffix5-01  suffix5-02 suffix5-03 suffix5-04 suffix5-05

While you can use other methods like du to get the total file size, I don't know if split is based on the size du gets, and that's not an elegant way to do it.

Therefore, is there a perfect solution to achieve the desired results?


Solution

  • You can do that with GNU Parallel.

    First make a 10MB file to work with:

    dd if=/dev/zero bs=10240 count=1024 > data.bin
    

    Now split into 1MB chunks, naming each chunk suffix{TOTALCHUNKS}-{CHUNKNUMBER}

    parallel --recend '' --plus --pipepart --block 1M cat \> suffix{##}-{#} :::: data.bin
    

    Result

    -rw-r--r--     1 mark  staff   1048576  9 Aug 16:57 suffix10-1
    -rw-r--r--     1 mark  staff   1048576  9 Aug 16:57 suffix10-2
    -rw-r--r--     1 mark  staff   1048576  9 Aug 16:57 suffix10-3
    -rw-r--r--     1 mark  staff   1048576  9 Aug 16:57 suffix10-4
    -rw-r--r--     1 mark  staff   1048576  9 Aug 16:57 suffix10-5
    -rw-r--r--     1 mark  staff   1048576  9 Aug 16:57 suffix10-6
    -rw-r--r--     1 mark  staff   1048576  9 Aug 16:57 suffix10-7
    -rw-r--r--     1 mark  staff   1048576  9 Aug 16:57 suffix10-8
    -rw-r--r--     1 mark  staff   1048576  9 Aug 16:57 suffix10-9
    -rw-r--r--     1 mark  staff   1048576  9 Aug 16:57 suffix10-10
    

    Notes:

    • You need --recend '' to stop GNU Parallel trying to split your file on linefeeds

    • You need --plus so that {##} is set to the total number of jobs

    • You need --pipepart to make it faster on seekable files - if your file is not seekable, use --pipe instead

    • {##} means the total number of chunks

    • {#} means the current chunk number