Search code examples
rxmlopenstreetmap

Iteration among the tags inside an OSM - XML file


I am working on an osm data file like below.

  ...     
       ...   
           ...

   <node id="4165094897" lat="41.0492396" lon="29.0260049" version="1">
    <tag k="name" v="Adnan Yeri"/>
    <tag k="amenity" v="cafe"/>
    <tag k="wheelchair" v="limited"/>
</node>
<node id="4165094899" lat="41.0492902" lon="29.0258856" version="1">
    <tag k="name" v="Piano Restaurant Cafe"/>
    <tag k="wheelchair" v="limited"/>
</node>
<node id="4165094900" lat="41.0493468" lon="29.0258547" version="1">
    <tag k="name" v="28 Black"/>
    <tag k="shop" v="yes"/>
    <tag k="amenity" v="restaurant"/>
</node>
<node id="4165094901" lat="41.0494034" lon="29.0258145" version="1">
    <tag k="name" v="Gratis"/>
    <tag k="shop" v="yes"/>
     ...
          ...
               ...

I try to get the id, lat, lon ,amenity and name values of the nodes which have an amenity attribute inside the tags.

For instance, for the first node of the example data, since it has an amenity attribute inside the tags, I want to get;

         id        lat        lon       name     amenity
    4165094897 41.0492396 29.0260049 Adnan Yeri    cafe

However, since there is no amenity in the second node, I want to pass it.

To achieve that, I found the nodes including the aminity tag inside it by using osmar library as below;

require(XML)
data <- xmlParse("/users/maydin/Desktop/Istanbul.osm")

library(osmar)
datam<- as_osmar(data)

ids_a <- find(datam, node(tags(k== "amenity")))

length(ids_a)
15212 # Number of amenity in tags in nodes

After, that I used XML package,

for(i in 1:length(ids_a)) {

  find1 <- paste0('//*/node[@id=\"',ids_a[i],'\"]')
  find2 <- paste0('//*/node[@id=\"',ids_a[i],'\"]/tag[@k=\"name\"]')
  find3 <- paste0('//*/node[@id=\"',ids_a[i],'\"]/tag[@k=\"amenity\"]')

  on1 <- xmlAttrs(data[find1][[1]])
  on2 <- xmlAttrs(data[find2][[1]])
  on3 <- xmlAttrs(data[find3][[1]])
 ...
    .... }

After applying some dataframe operations, those calculations give the expected results. BUT it takes about 7.3 second for just one iteration. Since there exist 15212, it means 31 hours!!

Then, I also tried;

xpathSApply(data,"//*/node[@id=\"6554996802\"]")
# 6554996802 is just one of the ids out of 15212

And it gave,

   [[1]]
  <node id="6554996802" lat="40.9220973" lon="29.1279101" version="1">
   <tag k="name" v="Burcu Cafe"/>
   <tag k="amenity" v="cafe"/>
  </node> 

Since it makes just a single search inside the data, it is relatively faster. However, I couldn't make a step further from this point.

Any suggestion please?


Solution

  • This should work pretty fast... xpath to the rescue :)

    library(xml2)
    library(magrittr) #for pipe-operator
    
    #read in xml (see section below for sample data)
    doc <- read_xml( "./test.xml" )
    
    #get the parent-node 'node' from a tag-node where the k-attribute = amenity
    nodes <- xml_find_all( doc, "//tag[@k='amenity']/parent::node" )
    
    #build data.frame
    data.frame( id  =     xml_attr( nodes, "id" )  %>% as.numeric(),
                lat =     xml_attr( nodes, "lat" ) %>% as.numeric(),
                lon =     xml_attr( nodes, "lon" ) %>% as.numeric(),
                name =    xml_find_first( nodes, ".//tag[@k='name']") %>% xml_attr("v"),
                amenity = xml_find_first( nodes, ".//tag[@k='amenity']") %>% xml_attr("v"),
                stringsAsFactors = FALSE
              )
    
    #           id      lat      lon       name    amenity
    # 1 4165094897 41.04924 29.02600 Adnan Yeri       cafe
    # 2 4165094900 41.04935 29.02585   28 Black restaurant
    

    sample data

    test.xml

    <nodes>
        <node id="4165094897" lat="41.0492396" lon="29.0260049" version="1">
            <tag k="name" v="Adnan Yeri"/>
            <tag k="amenity" v="cafe"/>
            <tag k="wheelchair" v="limited"/>
        </node>
        <node id="4165094899" lat="41.0492902" lon="29.0258856" version="1">
            <tag k="name" v="Piano Restaurant Cafe"/>
            <tag k="wheelchair" v="limited"/>
        </node>
        <node id="4165094900" lat="41.0493468" lon="29.0258547" version="1">
            <tag k="name" v="28 Black"/>
            <tag k="shop" v="yes"/>
            <tag k="amenity" v="restaurant"/>
        </node>
    </nodes>