Search code examples
regexnotepad++kml

Notepad++ RegEx remove between tags when word matched


I had a similiar question that was used for numbers this time I need to use it for keyword. Below is the sample data that I'm using from a KML file. I would like to remove all placemarks that contain the word footway.

 <Placemark>
        <styleUrl>#nothing</styleUrl>
        <ExtendedData>
            <SchemaData>
                <SimpleData>highway</SimpleData>
            </SchemaData>
        </ExtendedData>
        <LineString>
            <coordinates>0.0000,0.0000,0</coordinates>
        </LineString>
    </Placemark>     
    <Placemark>
        <styleUrl>#nothing</styleUrl>
        <ExtendedData>
            <SchemaData>
                <SimpleData>footway</SimpleData>
            </SchemaData>
        </ExtendedData>
        <LineString>
            <coordinates>0.0000,0.0000,0</coordinates>
        </LineString>
    </Placemark>

I tried to use the following however it is capturing everything

(?i)<Placemark>.*?footway.*?</Placemark>

Below is my notepad++ settings

Find what: (?i)<Placemark>.*?footway.*?</Placemark>
Replace with:
Warp around
Search Mode: Regular expression & mathces newline

Solution

  • Here is a way to go:

    • Find what: <Placemark>(?:(?!<Placemark).)*footway(?:.(?!<Placemark))*</Placemark>
    • Replace with: NOTHING

    This will replace all <Placemark> blocks that contain footway and only them.

    (?!<Placemark) is a negative lookahead that assumes there're no <Placemark> before footway, so, when you have many <Placemark>'s the regex matches a single <Placemark> at a time.

    (?:(?!<Placemark).)* is a non capture group, that occurs 0 or more times and does not contain (?!<Placemark) followed by a character.