I have a large data file which looks like:
//
ID 1.1.1.258
DE 6-hydroxyhexanoate dehydrogenase.
CA 6-hydroxyhexanoate + NAD(+) = 6-oxohexanoate + NADH.
CC -!- Involved in the cyclohexanol degradation pathway in Acinetobacter
CC NCIB 9871.
//
ID 1.1.1.259
DE 3-hydroxypimeloyl-CoA dehydrogenase.
CA 3-hydroxypimeloyl-CoA + NAD(+) = 3-oxopimeloyl-CoA + NADH.
CC -!- Involved in the anaerobic pathway of benzoate degradation in
CC bacteria.
//
ID 1.1.1.260
DE Sulcatone reductase.
CA Sulcatol + NAD(+) = sulcatone + NADH.
CC -!- Studies on the effects of growth-stage and nutrient supply on the
CC stereochemistry of sulcatone reduction in Clostridia pasteurianum,
CC C.tyrobutyricum and Lactobacillus brevis suggest that there may be at
CC least two sulcatone reductases with different stereospecificities.
//
I want to extract sections of this file that contain the work anaerobic
. I specifically want the ID line.
Is there a means to search the file between ID and // to find anaerobic
and print the output to a new file? If the whole section is printed that is fine as I figure I can grep it out after.
Expected out should be either
ID 1.1.1.259
or
ID 1.1.1.259
DE 3-hydroxypimeloyl-CoA dehydrogenase.
CA 3-hydroxypimeloyl-CoA + NAD(+) = 3-oxopimeloyl-CoA + NADH.
CC -!- Involved in the anaerobic pathway of benzoate degradation in
CC bacteria.
//
it's simple with awk
awk '/anaerobic/' RS='//\n' ORS='\n//' ./file.txt