Search code examples
pythonxmlbeautifulsoupurllib2

Extract value from a particular xml node


How am I able to extract the source IP relevant for "main.mp3" mount point from the below xml source

My current code gets the first on the list which happens to be for the listen.mp3 mount point, however I would like the extraction to be bound to a particular mount point

Code that extracts the source IP:

SERVER = 'http://localhost:8382/admin/stats.xml'
authinfo = urllib2.HTTPPasswordMgrWithDefaultRealm()
authinfo.add_password(None, SERVER, 'xxxxxx', 'xxxxxxx')
page = 'http://localhost:8382/admin/stats.xml'
handler = urllib2.HTTPBasicAuthHandler(authinfo)
myopener = urllib2.build_opener(handler)
opened = urllib2.install_opener(myopener)
output = urllib2.urlopen(page)
soup = BeautifulSoup(output.read(), "lxml")
print "source ip: ",soup.select('source_ip')[0].text

xml source of the link:

<icestats>
<admin>[email protected]</admin>
<client_connections>580473</client_connections>
<clients>32</clients>
<connections>1217611</connections>
<file_connections>220</file_connections>
<host>localhost</host>
<listener_connections>374451</listener_connections>
<listeners>29</listeners>
<location>Australia</location>
<server_id>Icecast 2.4.2</server_id>
<server_start>Thu, 15 Feb 2018 21:17:23 +1100</server_start>
<server_start_iso8601>2018-02-15T21:17:23+1100</server_start_iso8601>
<source_client_connections>99</source_client_connections>
<source_relay_connections>0</source_relay_connections>
<source_total_connections>99</source_total_connections>
<sources>2</sources>
<stats>0</stats>
<stats_connections>0</stats_connections>
<source mount="/listen.mp3">
<audio_info>channels=2;samplerate=44100;bitrate=64</audio_info>
<channels>2</channels>
<genre>Islamic Talk</genre>
<listener_peak>52</listener_peak>
<listeners>29</listeners>
<listenurl>http://localhost:8382/listen.mp3</listenurl>
<max_listeners>unlimited</max_listeners>
<public>1</public>
<samplerate>44100</samplerate>
<server_description>Qkradio Station Australia</server_description>
<server_name>listen.mp3</server_name>
<server_type>audio/mpeg</server_type>
<slow_listeners>220</slow_listeners>
<source_ip>127.0.0.1</source_ip>
<stream_start>Mon, 19 Feb 2018 23:08:01 +1100</stream_start>
<stream_start_iso8601>2018-02-19T23:08:01+1100</stream_start_iso8601>
<title>ibtihal.mp3 - 1</title>
<total_bytes_read>634036021</total_bytes_read>
<total_bytes_sent>13637049457</total_bytes_sent>
<user_agent>Liquidsoap/1.3.3 (Unix; OCaml 4.02.3)</user_agent>
</source>
<source mount="/main.mp3">
<audio_info>bitrate=170</audio_info>
<bitrate>170</bitrate>
<genre>Islam</genre>
<listener_peak>2</listener_peak>
<listeners>1</listeners>
<listenurl>http://localhost:8382/main.mp3</listenurl>
<max_listeners>unlimited</max_listeners>
<public>1</public>
<server_description>Quran Kareem Radio</server_description>
<server_name>Quran Kareem Radio</server_name>
<server_type>audio/mpeg</server_type>
<server_url>http://qkradio.com.au</server_url>
<slow_listeners>1</slow_listeners>
<source_ip>60.241.175.9</source_ip>
<stream_start>Tue, 20 Feb 2018 00:56:23 +1100</stream_start>
<stream_start_iso8601>2018-02-20T00:56:23+1100</stream_start_iso8601>
<total_bytes_read>582030204</total_bytes_read>
<total_bytes_sent>588819584</total_bytes_sent>
<user_agent>instreamer</user_agent>
</source>
</icestats>

Solution

  • You could select the container 'source' tags and collect 'source_ip' tags on each of them, so each IP will be relevant to a mount poin.

    soup = BeautifulSoup(xml, "lxml")
    for mount in soup.select('source[mount]'):
        print "mount point: ", mount['mount']
        print "source ip: ", mount.select_one('source_ip').text
    

    Or you could create a dictionary and select IP by mount point, eg,

    data = {
        mount['mount']: mount.select_one('source_ip').text 
        for mount in soup.select('source[mount]')
    }
    print data['/listen.mp3']
    

    If you want to select the IP from a specific point only, just select the 'source_ip' tag on that node.

    ip = soup.select_one('source[mount=/main.mp3] source_ip').text
    print "source ip:", ip