Search code examples
pythonxmlbeautifulsoupnodesopenstreetmap

nested tags and attributes in BeautifulSOUP and OpenStreetMap XML


Please help to write the meaningful code for the task: I need to count for all tags "way" in XML OpenStreet Map file, the quantity of "nd" tag in each, and input the id of the tag 'way', which include the biggest quantity of tags "nd". If there are several ide's then input the first one in alphabetical order. Seems easy, but I cannot understand how to operate. (I only think it will be useful to use vocabulary) This is the code:

from urllib.request import urlopen, urlretrieve

from bs4 import BeautifulSoup


resp = urlopen('https://stepik.org/media/attachments/lesson/245681/map2.osm') # 

xml = resp.read().decode('utf8') # 

soup = BeautifulSoup(xml, 'xml') # делаем суп с помощью lxml

cnt = 0

names ={}

for way in soup.find_all('way'): # go through the nodes

    flag=False

    for nd in way('nd'):

        flag=True

        if nd['k'] == 'id':

            name=nd['v']

    if flag:

        if name not in names:

            names[name]=0

        names[name]+=1

print(sort(names)) 

Solution

  • You can use max() builtin method to find <way> tag with biggest quantity of <nd>.

    For example:

    import requests
    from bs4 import BeautifulSoup
    
    
    url = 'https://stepik.org/media/attachments/lesson/245681/map2.osm'
    soup = BeautifulSoup(requests.get(url).content, 'html.parser')
    
    num_way = len(soup.select('way'))
    w = max(sorted(soup.select('way:has(nd)'), reverse=True, key=lambda tag: int(tag['id'])), key=lambda tag: len(tag.select('nd')))
    
    print('number of <way>:', num_way)
    print('id:', w['id'])
    print('quantity of <nd>:', len(w.select('nd')))
    

    Prints:

    number of <way>: 3181
    id: 227140108
    quantity of <nd>: 249