Search code examples
bashawksedgreppcregrep

How to remove multi-line with multi-pattern, awk pcre2grep sed


I have this text file

tittleofthis123
<Bunlde ver=5.0>
 <Packages>    
  <Package Type="app" FileName="Package_ARM64_beta.msix" Offset="79" Size="5791033">
   <Resources>
    rescode11
   </Resources>
   <b4:Dependencies>
     depcode12
   </b4:Dependencies>
  </Package>
  <Package Type="app" FileName="Package_x64_beta.msix" Offset="580113" Size="7195285">
   <Resources>
    rescode21
    rescode22
   </Resources>
  </Package>
  <Package Type="res" FileName="Package_lang-cy.msix" Offset="579" Size="15">
   <Resources>
    rescode31
   </Resources>
  </Package>
  <Package Type="res" FileName="Package_lang-af.msix" Offset="5791" Size="1578">
   <Resources>
    rescode41
   </Resources>
  </Package>
 </Packages>
</Bundle>

I need the output to be

tittleofthis123
<Bunlde ver=5.0>
 <Packages>    
  <Package Type="app" FileName="Package_x64_beta.msix" Offset="580113" Size="7195285">
   <Resources>
    rescode21
    rescode22
   </Resources>
  </Package>
  <Package Type="res" FileName="Package_lang-af.msix" Offset="5791" Size="1578">
   <Resources>
    rescode41
   </Resources>
  </Package>
 </Packages>
</Bundle>

I have tried this

pcre2grep -M -v 'ARM64.*(\n|.)*</Package>|lang-cy.*(\n|.)*</Package>' 123.txt

But off course the result isn't right, because all the package have same </Package>, so instead filtering for ARM64 only, it filter out all to the bottom Package. And I have more Package to exclude, so probably I shouldn't use -v inverse, but no idea how to retain the Title, <Bundle>, and <Packages>

tried this and this

awk '/ARM64/,/<\/Package>/ {next} {print}' 123.txt

It actually works well. But I don't understand how to make it filter more than one Package like '/ARM64/,/<\/Package>/ and /lang-cy/,/<\/Package>/. And same, I need to exclude a lot of Package, so maybe not to do the {next} thing, still have no idea how to retain Title, <Bundle>, and <Packages>

I think this is pretty close to what I need

sed -n '/<Package/{:a;N;/\n*<\/Package>/!ba; /x64/p}' 123.txt

It also works very well, but my very incompetence still same, don't know how to join more filter like x64 andlang-af. And same about the Title, <Bundle>, and <Packages>

Actually this is pretty much the same case, but I don't understand at the answer at all


Solution

  • This might work for you (GNU sed):

    sed '/<Package Type/{:a;N;/<\/Package>/!ba;/_x64_\|_lang-af/!d}' file
    

    Gather up lines between <Package Type and </Package> and do not delete the collection if it contains _x64_ or _lang-af.