Search code examples
pandasxmldataframebeautifulsoup

How to read in an xml file in Python without node


I am trying to read in in Python this file

https://www.europarl.europa.eu/meps/en/full-list/xml/a

And I have used this code

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://www.europarl.europa.eu/meps/en/full-list/xml/a'
soup = bs(requests.get(url, headers=headers).text, 'lxml')
df = pd.read_xml(str(soup))
print(df)

But, the result looks wrong.

   meps
0   NaN

Can anyone help me please?


Solution

  • No need to use intermediate libraries, read_xml can handle a URL:

    df = pd.read_xml('https://www.europarl.europa.eu/meps/en/full-list/xml/a')
    

    If you need to pass custom header, use storage_options:

    url = 'https://www.europarl.europa.eu/meps/en/full-list/xml/a'
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
    }
    
    df = pd.read_xml(url, storage_options=headers)
    

    Output:

                  fullName   country                                     politicalGroup      id                         nationalPoliticalGroup
    0  Magdalena ADAMOWICZ    Poland  Group of the European People's Party (Christia...  197490                                    Independent
    1          Asim ADEMOV  Bulgaria  Group of the European People's Party (Christia...  189525  Citizens for European Development of Bulgaria
    2    Isabella ADINOLFI     Italy  Group of the European People's Party (Christia...  124831                                   Forza Italia
    3      Matteo ADINOLFI     Italy                       Identity and Democracy Group  197826                                           Lega
    4    Alex AGIUS SALIBA     Malta  Group of the Progressive Alliance of Socialist...  197403                               Partit Laburista
    ...