Search code examples
xmlxpathxmlstarlet

xmlstarlet delete all elements except one from xml data feed


On my Debian VPS, I want to keep element only CategoryName Mobile Phones and delete all other elements having Category Names such as Mobile Accessories Laptops etc. total 20 different Category Names. XML file size is big 800 MB.

xmlstarlet el -u sd.xml
Products
Products/Product
Products/Product/Brand
Products/Product/CategoryName
Products/Product/CategoryPathAsString

Here sample XML :

<Products>
<Product>
   <ProductID>92545172</ProductID>
   <ProductSKU>630348288360</ProductSKU>
   <ProductName>Self Snap Aux Connected Selfie Stick</ProductName>
   <ProductDescription>This product is charge free </ProductDescription>
   <ProductPrice>353.00</ProductPrice>
   <ProductPriceCurrency>INR</ProductPriceCurrency>
   <WasPrice>649.00</WasPrice>
   <DiscountedPrice>0.00</DiscountedPrice>
   <ProductURL>http://clk</ProductURL>
   <PID>8053</PID>
   <MID>159526</MID>
   <ProductImageLargeURL>http://</ProductImageLargeURL>
   <StockAvailability>in stock</StockAvailability>
   <Brand>Self Snap</Brand>
   <CategoryName>Camera Accessories</CategoryName>
   <CategoryPathAsString>Root|Cameras &amp; Accessories|Camera Accessories|</CategoryPathAsString>
</Product>
<Product>
   <ProductID>29911116</ProductID>
   <ProductSKU>647266238</ProductSKU>
   <ProductName>Philips 40PFL5059/V7 40 inches Full HD LED Television</ProductName>
   <ProductDescription>LED Display Resolution : 1920 x 1080</ProductDescription>
   <ProductPrice>30196.00</ProductPrice>
   <ProductPriceCurrency>INR</ProductPriceCurrency>
   <WasPrice>39800.00</WasPrice>
   <DiscountedPrice>0.00</DiscountedPrice>
   <ProductURL>http://clk</ProductURL>
   <PID>8053</PID>
   <MID>159526</MID>
   <ProductImageLargeURL>http://n1</ProductImageLargeURL>
   <StockAvailability>in stock</StockAvailability>
   <Brand>Philips</Brand>
   <CategoryName>Televisions</CategoryName>
   <CategoryPathAsString>Root|TVs, Audio &amp; Video|Televisions|</CategoryPathAsString>
</Product>
<Product>
   <ProductID>93959216</ProductID>
   <ProductSKU>683203029</ProductSKU>
   <ProductName>Micromax Canvas Beat A114R</ProductName>
   <ProductDescription>Type : MultiSim Sim : Dual SIM Os Version : Android </ProductDescription>
   <ProductPrice>7999.00</ProductPrice>
   <ProductPriceCurrency>INR</ProductPriceCurrency>
   <WasPrice>9990.00</WasPrice>
   <DiscountedPrice>0.00</DiscountedPrice>
   <ProductURL>http://clk</ProductURL>
   <PID>8053</PID>
   <MID>159526</MID>
   <ProductImageLargeURL>http://n1</ProductImageLargeURL>
   <StockAvailability>in stock</StockAvailability>
   <Brand>Micromax</Brand>
   <CategoryName>Mobile Phones</CategoryName>
   <CategoryPathAsString>Root|Mobiles &amp; Tablets|Mobile Phones|</CategoryPathAsString>
</Product>
</Products>

Solution

  • It isn't clear without sample XML and expected result XML. Assuming that you want to delete elements named CategoryName having inner text not equals to "Mobile Phones", you can try using this xpath :

    /Products/Product/CategoryName[. != 'Mobile Phones']
    

    Turned out you want to delete <Product> element having child element <CategoryName> value not equals to "Mobile Phones". In this case, you can try the following xpath :

    /Products/Product[CategoryName != 'Mobile Phones']