I have a large file (1Gb) and I need to extract a few lines of it using the record number. I wrote my script with sed and, as it took too much time, I decided to investigate it. It turns out that, when I run something like sed -n '15689,15696p' filename the print is quick, but I have a time delay after it, and this is turning my script really slow. Doing the same task with awk the delay is smaller, but it's still there! The command line I used for awk was: awk 'NR>=15689 && NR<=15696' filename
I tried to print just one line (sed -n '15689p' filename) and the same problem appears!
I'm wondering if no one has ever seen that before and knows how to get rid of this stupid delay. It seems to me this is a big problem, because this delay occurs after the printing task! I already searched in this and in other forums and I haven't seen a question with this issue. Can someone help me? Thanks
Avoid using sed -n '15689,15696p'
, as sed will go through the entire file. The fastest way I know is this:
head -15696 filename | tail -10
I benchmarked it, and it runs way faster:
$ seq 1 100000000 > file
$ time (head -50000000 file | tail -10) > /dev/null
real 0m0.694s
user 0m0.830s
sys 0m0.333s
$ time (sed -n '49999991,50000000p' file) > /dev/null
real 0m6.018s
user 0m5.863s
sys 0m0.160s
$ time (sed -n '50000000q;49999991,50000000p' file) > /dev/null
real 0m3.197s
user 0m3.153s
sys 0m0.043s
$ time (awk 'NR>=49999991 && NR<=50000000' file) > /dev/null
real 0m12.665s
user 0m12.543s
sys 0m0.123s
$ time (awk 'NR>=49999991 && NR<=50000000{print} NR==50000001{exit}' file)
real 0m9.104s
user 0m9.010s
sys 0m0.100s