Search code examples
awksedpattern-matchingmultiline

How to extract multiple lines between a multi-line pattern and a second string pattern


The goal is to get the version of a source package in a reprepro-based deb repository.

Since the tracking of source packages is still experimental in reprepro, the list command has issues with --list-format option and thus cannot be used in this use case.

An excerpt of the output of the command to print out all information about tracked source packages is:

...

Distribution: buster
Source: linux-latest
Version: 102
Files:
 pool/stable/l/linux-latest/linux-doc_4.19+102_all.deb a 2
 pool/stable/l/linux-latest/linux-headers-amd64_4.19+102_amd64.deb b 1
 pool/stable/l/linux-latest/linux-headers-cloud-amd64_4.19+102_amd64.deb b 1
 pool/stable/l/linux-latest/linux-headers-rt-amd64_4.19+102_amd64.deb b 1
 pool/stable/l/linux-latest/linux-image-amd64_4.19+102_amd64.deb b 1
 pool/stable/l/linux-latest/linux-image-amd64-dbg_4.19+102_amd64.deb b 1
 pool/stable/l/linux-latest/linux-image-cloud-amd64_4.19+102_amd64.deb b 1
 pool/stable/l/linux-latest/linux-image-cloud-amd64-dbg_4.19+102_amd64.deb b 1
 pool/stable/l/linux-latest/linux-image-rt-amd64_4.19+102_amd64.deb b 1
 pool/stable/l/linux-latest/linux-image-rt-amd64-dbg_4.19+102_amd64.deb b 1
 pool/stable/l/linux-latest/linux-perf_4.19+102_all.deb a 2
 pool/stable/l/linux-latest/linux-source_4.19+102_all.deb a 2

Distribution: buster
Source: linux-latest
Version: 103
Files:
 pool/stable/l/linux-latest/linux-doc_4.19+103_all.deb a 0
 pool/stable/l/linux-latest/linux-headers-amd64_4.19+103_amd64.deb b 1
 pool/stable/l/linux-latest/linux-headers-cloud-amd64_4.19+103_amd64.deb b 1
 pool/stable/l/linux-latest/linux-headers-rt-amd64_4.19+103_amd64.deb b 1
 pool/stable/l/linux-latest/linux-image-amd64_4.19+103_amd64.deb b 1
 pool/stable/l/linux-latest/linux-image-amd64-dbg_4.19+103_amd64.deb b 1
 pool/stable/l/linux-latest/linux-image-cloud-amd64_4.19+103_amd64.deb b 1
 pool/stable/l/linux-latest/linux-image-cloud-amd64-dbg_4.19+103_amd64.deb b 1
 pool/stable/l/linux-latest/linux-image-rt-amd64_4.19+103_amd64.deb b 1
 pool/stable/l/linux-latest/linux-image-rt-amd64-dbg_4.19+103_amd64.deb b 1
 pool/stable/l/linux-latest/linux-perf_4.19+103_all.deb a 2
 pool/stable/l/linux-latest/linux-source_4.19+103_all.deb a 2

...

The goal here is to get the version of for instance linux-latest source package using for example the binary package name linux-source_4.19+103_all.deb by extracting all lines between:

1) a multi-line pattern:

Distribution: buster
Source: linux-latest

2) a string pattern:

linux-source_4.19+103_all.deb

The distribution name, source package name and binary package names are variable, so the number of captured lines is variable, but the base layout remains constant.

For that same reason, it seems that pcre2grep --multiline cannot be used here.

I cannot see a way to use multi-line patterns with awk or sed, although there must be a way, at least with awk.

Other stackoverflow answers don't seem to apply here:

Any suggestion?


Solution

  • It's not entirely clear what you're trying to do but i think you're saying you want to print the version value when a specific string appears in the record. If so that's just:

    $ awk -v str='linux-source_4.19+103_all.deb' -F': *' '{f[$1]=$2} index($0,str){print f["Version"]}' file
    103
    

    If you wanted to also test for the specific distribution and source that's just a tweak:

    $ awk -v str='linux-source_4.19+103_all.deb' -v dist='buster' -v src='linux-latest' -F': *' '
        { f[$1] = $2 }
        (f["Distribution"]==dist) && (f["Source"]==src) && index($0,str) { print f["Version"] }
    ' file
    103
    

    If you need something different then edit your question to clarify your requirements.