I have numerous amounts of text files that I would like to loop through. While looping I would like to find lines that match a list of strings and extract each to a separate folder. I have a variable "ij" that need to be split into "i" and "j" to match two columns. For example 2733 needs to be split into 27 and 33. The script searches each text file and extracts every line that has an i and j of 2733.
The problem here is that I have nearly 100 different strings, so it takes about 35 hours to get through all these strings.
Is there any way to extract all of the variables to separate files in just one loop? I am trying to loop through a text file, extract all the lines that are in my list of strings and output them to their own folder, then move onto the next text file.
I am currently using the "awk" command to accomplish this.
list="2741 2740 2739 2738 2737 2641 2640 2639 2638 2541 2540 2539 2538 2441 2440 2439 2438 2341 2340 2339 2241 2240 2141"
for string in $list
do
for i in ${string:0:2}
do
for j in ${string:2:2}
do
awk -v i=$i -v j=$j '$2==j && $3==i {print $0}' $datadir/*.txt >"${fileout}${i}_${j}_Output.txt"
done
done
done
So I did this:
# for each 4 digits in the list
# add "a[" and "];" before and after the four numbers
# so awk array is "a[2741]; a[2740]; a[2739]; ...."
awkarray=$(awkarray=$(<<<"$list" sed -E 's/[0-9]{4}/a[&];/g')
awk -vfileout="$fileout" '
BEGIN {'"$awkarray"'}
$2 $3 in a {
print $0 > fileout $2 "_" $3 "_Output.txt"
}
' "$datadir"/*.txt
So first I transform the list to load it as an array in awk. The array has only indexes, so I can check if an index exists in an array, the array elements have no values. Then I simply check if the concatenation of $2 and $3 exists in the array, if it exists, the output is redirected to proper filename.
Remember to quote your variables. $datadir/*.txt
may not work, when datadir
contains spaces, do "$datadir"/*.txt
. The newlines in awk script
above can be removed, so if you prefer a oneliner:
awk -vfileout="$fileout" 'BEGIN {'"$(<<<"$list" sed -E 's/[0-9]{4}/a[&];/g')"'} $2 $3 in a { print $0 > fileout $2 "_" $3 "_Output.txt" }' "$datadir"/*.txt