Search code examples

How to parse XML namespaces in Python 3 and Beautiful Soup 4?

I am trying to parse XML with BS4 in Python 3.

For some reason, I am not able to parse namespaces. I tried to look for answers in this question, but it doesn't work for me and I don't get any error message either.

Why does the first part work, but the second does not?

import requests
from bs4 import BeautifulSoup

input = """
<?xml version="1.0" encoding="utf-8"?>
<wb:countries page="1" pages="6" per_page="50" total="299" xmlns:wb="">
  <wb:country id="ABW">
    <wb:region id="LCN" iso2code="ZJ">Latin America &amp; Caribbean </wb:region>
    <wb:adminregion id="" iso2code="" />
    <wb:incomeLevel id="HIC" iso2code="XD">High income</wb:incomeLevel>
    <wb:lendingType id="LNX" iso2code="XX">Not classified</wb:lendingType>
  <wb:country id="AFE">
    <wb:name>Africa Eastern and Southern</wb:name>
    <wb:region id="NA" iso2code="NA">Aggregates</wb:region>
    <wb:adminregion id="" iso2code="" />
    <wb:incomeLevel id="NA" iso2code="NA">Aggregates</wb:incomeLevel>
    <wb:lendingType id="" iso2code="">Aggregates</wb:lendingType>
    <wb:capitalCity />
    <wb:longitude />
    <wb:latitude />

  <title>Some string</title>
  <pubDate>Wed, 01 Sep 2022 12:45:00 +0000</pubDate>
  <guid isPermaLink="false">4574785</guid>
  <itunes:subtitle>A subtitle</itunes:subtitle>
  <enclosure length="0" type="audio/mpeg" url=""/>
  <itunes:image href=""/>

soup = BeautifulSoup(input, 'xml')

# Working
for x in soup.find_all('wb:country'):

# Not working
for x in soup.find_all('item'):


  • It looks like non conform XML were you have two documents mixed togehter - A namespace is expected in strict mode of XML parser if it is defined - Use lxml instead to get your expected result in this wild mix:

    soup = BeautifulSoup(xml_string, 'lxml')
    # Working
    for x in soup.find_all('wb:country'):
    # also working
    for x in soup.find_all('item'):

    Note: Avoid using python reserved terms (keywords), this could have unwanted effects on the results of your code.

    If you have second document separat use:

    for x in soup.find_all('item'):


    from bs4 import BeautifulSoup
    xml_string = """
    <?xml version="1.0" encoding="utf-8"?>
      <title>Some string</title>
      <pubDate>Wed, 01 Sep 2022 12:45:00 +0000</pubDate>
      <guid isPermaLink="false">4574785</guid>
      <itunes:subtitle>A subtitle</itunes:subtitle>
      <enclosure length="0" type="audio/mpeg" url=""/>
      <itunes:image href=""/>
    soup = BeautifulSoup(input, 'xml')
    # working
    for x in soup.find_all('item'):

    Else you have to define a namespace for your item and can still use XML parser:

    <?xml version="1.0" encoding="utf-8"?>
    <item xmlns:itunes="">
      <title>Some string</title>
      <pubDate>Wed, 01 Sep 2022 12:45:00 +0000</pubDate>
      <guid isPermaLink="false">4574785</guid>
      <itunes:subtitle>A subtitle</itunes:subtitle>

    When a namespace is defined for an element, all child elements with the same prefix are associated with the same namespace.