not sure how to ask the question appropriately, but here's the use case:
<way id="foo">... <\way>
bad_ways
I could write a for loop & cycle through a bunch of sed
statements like this:
sed -i.bu '/<way id="1_bad_way_entry".*/,/<\/way>/d' in.xml
but... this requires ~250 cycles through an 18G file & associated disk writes, etc., which right now takes about 18min per cycle (spinning disk... will fix that shortly by switching machines. Update: SSD improves to about 6.5 min per cycle).
Is there any way to ask sed
to match any entry in bad_ways
and do this in 1 pass?
Or, are there better tools for this than sed
? Thanks in advance!
You can use command substitution to assemble the sed
script on the run.
(Note: in the following I use sed
's -E
option to save some backslash; if you don't you have to create the sed
script by including the backslashes as needed.)
For instance, assuming the bad_ways
file is like this:
one
two
three
and that the huge_file
is like this:
everything starts with a zero, then one is next, then two, then three, finally four
you can accomplish the task with the following command to substitute all patterns listed in bad_ways
with XXX
:
sed -E 's/'"$(sed -zE 's/\n([^$])/|\1/g' bad_ways)"'/XXX/g' huge_file
Then output is
everything starts with a zero, then XXX is next, then XXX, then XXX, finally four
As you can see, the sed
script that acts on the huge_file
is made up by concatenating three strings:
s/
which is single quoted (you should always prefer single quotes, unless you need double quotes, as in 2.)sed -zE 's/\n([^$])/|\1/g' bad_ways
, which is double quoted to allow command substitution, and which generates one|two|three
/XXX/g
.All this results in the string s/one|two|three/XXX/g
.
This is not clearly the string that you need for your script, but I hope this answer shows you an example of how to use command substitution $(…)
and appropriate quoting with '
and "
to craft a command (sed
, awk
or whatever) dynamically.
In hindsight this answer is based on the same "philosophy" as the one in the answer linked off a comment. However I'm not temporary saving the script to a file. This could be of minor importance if the script itself is small (and it is small, based on your description).