I have a file, say "Line_File" with a list of line start & end numbers and file ID :
F_a 1 108
F_b 109 1210
F_c 131 1190
I have another file, "Data_File" from where I need to fetch all the lines between the line numbers fetched from the Line_File.
The command in sed:
'sed -n '1,108p' Data_File > F_a.txt
does the job but I need to do this for all the values in columns 2 & 3 of Line_File and save it with the file name mentioned in the column 1 of the Line_File.
If $1, $2 and $3 are the three cols of Line_File then I am looking for a command something like
'sed -n '$2,$3p' Data_File > $1.txt
I can run the same using Bash Loop but that will be very slow for a very large file, say 40GB.
I specifically want to do this because I am trying to use GNU Parallel to make it faster and line number based slicing will make the output non-overlapping. I am trying to execute command like this
cat Data_File | parallel -j24 --pipe --block 1000M --cat LC_ALL=C sed -n '$2,$3p' > $1.txt
But I am no able to actually use the column assignment $1,$2 and $3 properly.
I tried the following command:
awk '{system("sed -n \""$2","$3"p\" Data_File > $1"NR)}' Line_File
But it doesn't work. Any idea where I am going wrong?
P.S If my question is not clear then please point out what else I should be sharing.
You may use xargs
with -P
(parallel) option:
xargs -P 8 -L 1 bash -c 'sed -n "$2,$3p" Data_File > $1.txt' _ < Line_File
Explanation:
xargs
command takes Line_File
as input by using <
-P 8
option allows it to run up to 8 processes in parallel-L 1
makes xargs
process one line at a timebash -c ...
forks bash
for each line in input file_
before <
passes _
as $0
and passes remaining 3 column in each input line as $1, $2,
$3`sed -n
runs sed
command for each line by forming a command lineOr you may use gnu parallel
like this:
parallel --colsep '[[:blank:]]' "sed -n '{2},{3}p' Data_File > {1}.txt" :::: Line_File