Search code examples
bashsedbenchmarkingheadcat

Fastest way to print a single line in a file


I have to fetch one specific line out of a big file (1500000 lines), multiple times in a loop over multiple files, I was asking my self what would be the best option (in terms of performance). There are many ways to do this, i manly use these 2

cat ${file} | head -1

or

cat ${file} | sed -n '1p'

I could not find an answer to this do they both only fetch the first line or one of the two (or both) first open the whole file and then fetch the row 1?


Solution

  • Drop the useless use of cat and do:

    $ sed -n '1{p;q}' file
    

    This will quit the sed script after the line has been printed.


    Benchmarking script:

    #!/bin/bash
    
    TIMEFORMAT='%3R'
    n=25
    heading=('head -1 file' 'sed -n 1p file' "sed -n '1{p;q} file" 'read line < file && echo $line')
    
    # files upto a hundred million lines (if your on slow machine decrease!!)
    for (( j=1; j<=100,000,000;j=j*10 ))
    do
        echo "Lines in file: $j"
        # create file containing j lines
        seq 1 $j > file
        # initial read of file
        cat file > /dev/null
    
        for comm in {0..3}
        do
            avg=0
            echo
            echo ${heading[$comm]}    
            for (( i=1; i<=$n; i++ ))
            do
                case $comm in
                    0)
                        t=$( { time head -1 file > /dev/null; } 2>&1);;
                    1)
                        t=$( { time sed -n 1p file > /dev/null; } 2>&1);;
                    2)
                        t=$( { time sed '1{p;q}' file > /dev/null; } 2>&1);;
                    3)
                        t=$( { time read line < file && echo $line > /dev/null; } 2>&1);;
                esac
                avg=$avg+$t
            done
            echo "scale=3;($avg)/$n" | bc
        done
    done
    

    Just save as benchmark.sh and run bash benchmark.sh.

    Results:

    head -1 file
    .001
    
    sed -n 1p file
    .048
    
    sed -n '1{p;q} file
    .002
    
    read line < file && echo $line
    0
    

    **Results from file with 1,000,000 lines.*

    So the times for sed -n 1p will grow linearly with the length of the file but the timing for the other variations will be constant (and negligible) as they all quit after reading the first line:

    enter image description here

    Note: timings are different from original post due to being on a faster Linux box.