Search code examples
bashmacoscommand-linekml

Best way to remove multiple entries from kml file


I have a very large KML file (over 20000 placemarkers). They are named by numbers which go up in increments of 5 starting at about 7000 up to 27000.

<Placemark>
    <name>7750</name>
    <description><![CDATA[converted by:</br><a href="http://gridreferencefinder.com/">GridReferenceFinder.com</a></br>]]></description>
    <Point>
        <coordinates>-0.99153654,52.225002,0</coordinates>
    </Point>
</Placemark>

I would like to remove any placemarker that doesnt end in 00 or 50. Having a placemarker every 5 metres is slowing down some of the lower end devices on site.

Is there some script, command or whatever that will check the name and if it doesn't end in 00 or 50 delete from <Placemark> to </Placemark> for that entry?

You would literally be saving me 10 hours work deleting them individually.


Solution

  • A Perl-one liner solution!

    I would like to remove any placemarker that doesnt end in 00 or 50
    First of all a solution for this; match anything except for ones end with 00 or 50

    ^(?:[7-9]\d|[1-2]\d\d)(?!00|50)\d\d$   
    

    Demo:

    ^(?:[7-9]\d|[1-2]\d\d)(?!00|50)\d\d$

    A fest test can be:

    perl -le 'print for grep{ /^(?:[7-9]\d|[1-2]\d\d)(?=00|50)\d\d$/  } 7000..27000'
    

    then read the entire file once:

    $/=undef;   
    

    then read all matches with a while loop:

    while/<Placemark>\s*?<name>(?:[7-9]\d|[1-2]\d\d)(?!00|50)\d\d.*?<\/Placemark>/sg  
    

    s flag is for reading as a single line or . can match newline, and g for global search

    then print the match (S&):

    perl -lne '$/=undef;print $& while/<Placemark>\s*?<name>(?:[7-9]\d|[1-2]\d\d)(?!00|50)\d\d.*?<\/Placemark>/sg' file
    

    pattern for match:

    <Placemark>\s*?<name>(?:[7-9]\d|[1-2]\d\d)(?!00|50)\d\d.*?<\/Placemark>
    

    demo:

    <Placemark>\s*?<name>(?:[7-9]\d|[1-2]\d\d)(?!00|50)\d\d.*?<\/Placemark>

    NOTE:
    If you notice this part (?!00|50) it is an exclude matcher that by using a lookahead, you can make it opposite, that means:

    ^(?:[7-9]\d|[1-2]\d\d)(?=00|50)\d\d$  
    

    only matches things that end with 00 or 50.
    So you can use this to switch between what you want and what you do not want.

    print all patterns that does not end with 00 or 50

    perl -lne '$/=undef;print $& while/<Placemark>\s*?<name>(?:[7-9]\d|[1-2]\d\d)(?!00|50)\d\d.*?<\/Placemark>/sg' file
    

    print all patterns that end with 00 or 50

    perl -lne '$/=undef;print $& while/<Placemark>\s*?<name>(?:[7-9]\d|[1-2]\d\d)(?!00|50)\d\d.*?<\/Placemark>/sg' file   
    

    How to Substitute

    if you like, you can use operator: s/regex-match/substitute-string/

    perl -pe '$/=undef;s/<Placemark>\s*?<name>(?:[7-9]\d|[1-2]\d\d)(?!00|50)\d\d.*?<\/Placemark>/==>DELETE<==/sg' file
    

    test:
    input:

    before...
    <Placemark>
        <name>7700</name>
        <description><![CDATA[converted by:</br><a href="http://gridreferencefinder.com/">GridReferenceFinder.com</a></br>]]></description>
        <Point>
            <coordinates>-0.99153654,52.225002,0</coordinates>
        </Point>
    </Placemark>
    after...
    ---------
    before...                                                                                                                                                                                              
    <Placemark>                                                                                                                                                                                            
        <name>7701</name>                                                                                                                                                                                  
        <description><![CDATA[converted by:</br><a href="http://gridreferencefinder.com/">GridReferenceFinder.com</a></br>]]></description>                                                                
        <Point>
            <coordinates>-0.99153654,52.225002,0</coordinates>
        </Point>
    </Placemark>
    after...
    --------
    before...
    <Placemark>
        <name>27650</name>
        <description><![CDATA[converted by:</br><a href="http://gridreferencefinder.com/">GridReferenceFinder.com</a></br>]]></description>
        <Point>
            <coordinates>-0.99153654,52.225002,0</coordinates>
        </Point>
    </Placemark>
    after...
    --------
    before...
    <Placemark>
        <name>27651</name>
        <description><![CDATA[converted by:</br><a href="http://gridreferencefinder.com/">GridReferenceFinder.com</a></br>]]></description>
        <Point>
            <coordinates>-0.99153654,52.225002,0</coordinates>
        </Point>
    </Placemark>
    after...
    
    end.
    

    the output:

    before...
    <Placemark>
        <name>7700</name>
        <description><![CDATA[converted by:</br><a href="http://gridreferencefinder.com/">GridReferenceFinder.com</a></br>]]></description>
        <Point>
            <coordinates>-0.99153654,52.225002,0</coordinates>
        </Point>
    </Placemark>
    after...
    ---------
    before...                                                                                                                                                                                              
    ==>DELETE<==
    after...
    --------
    before...
    <Placemark>
        <name>27650</name>
        <description><![CDATA[converted by:</br><a href="http://gridreferencefinder.com/">GridReferenceFinder.com</a></br>]]></description>
        <Point>
            <coordinates>-0.99153654,52.225002,0</coordinates>
        </Point>
    </Placemark>
    after...
    --------
    before...
    ==>DELETE<==
    after...
    
    end.
    

    NOTE.2:

    you can use -i for edit-in-place

    perl -i.bak -pe ' ... the rest of the script ...' file  
    

    It is better to use perl 5.22 or upper version