Search code examples
linuxbashtailheadwc

how to split a file with nbytes and using head and tail


i'm trying to make a script that split a file divided by nbytes. I already note this but i want to use head and tail and not using split as me.

#!/bin/sh

if [ $# -eq 0 ];then
    exit 1
fi
if [ $# -eq 1 ];then
    exit 1
fi
if [ $2 -eq 0 ];then
    exit 1
fi
if [ ! -f "$1" ];then
    exit 1
fi

split -d -b 1024 bigfile.bin bigfile.bin.

It giving this :

-rw-rw-r-- 1 madushan madushan 1024 déc.  10 17:34 bigfile.bin.00
-rw-rw-r-- 1 madushan madushan 1024 déc.  10 17:34 bigfile.bin.01
-rw-rw-r-- 1 madushan madushan 1024 déc.  10 17:34 bigfile.bin.02
-rw-rw-r-- 1 madushan madushan 1024 déc.  10 17:34 bigfile.bin.03
-rw-rw-r-- 1 madushan madushan 1024 déc.  10 17:34 bigfile.bin.04
-rw-rw-r-- 1 madushan madushan 1024 déc.  10 17:34 bigfile.bin.05
-rw-rw-r-- 1 madushan madushan 1024 déc.  10 17:34 bigfile.bin.06
-rw-rw-r-- 1 madushan madushan 1024 déc.  10 17:34 bigfile.bin.07
-rw-rw-r-- 1 madushan madushan 1024 déc.  10 17:34 bigfile.bin.08
-rw-rw-r-- 1 madushan madushan  784 déc.  10 17:34 bigfile.bin.09

Solution

  • You can implement a light-weight version of split with bash (+head, +tail). However, it will not be very efficient, as you will need to read the file times, where N=totalsize/nbytes. For small files, the overhead is small, for large file, very expensive.

    nbytes=1024
    file=bigfile.bin
    k=0
    i=0
    while tail --bytes=+$((nbytes*i)) < $file | head --bytes=$nbytes > $file.work ; do
        # Stop unless segment has data
        [ -s "$file.work" ] || break
        let i++
        echo "Segment: $i"
    
        mv "$file.work" "$file.$i"
    done
    rm -f $file.work
    

    If OK with just using head, possible to make is more efficient for large files. It will read the input only once, with no need to re-read anything.

    nbytes=1024
    file=bigfile.bin
    k=0
    i=0
    (
        while head --bytes=$nbytes > $file.work ; do
            [ -s "$file.work" ] || break
            let ++i
            mv "$file.work" "$file.$i"
        done
    ) < $file
    rm -f $file.work
    

    Also consider using 'dd', which has more powerful logic for large files.