Search code examples
pythonjsonxmlxmltodict

Xml to dict how to ignore some characters when I convert my xml file to json file


I wouldlike to remove some character when I try to convert my xml to dict :

data = xmltodict.parse(open('test.xml').read())

    with open('test2.json', "wt", encoding='utf-8', errors='ignore') as f:
        json.dump(data, f, indent=4, sort_keys=True)
        return data

The problem actually i have many json file some json file like this :

{
        "pcrs:test A": {
            "pcrs:nature": "03", 
            "pcrs:producteur": "SIEML"
}}

And some json file like this(without pcrs) :

{
        "test B": {
            "nature": "03", 
            "producteur": "SIEML",
}}

How to force any file like the first example to be without 'pcrs:' as the seconde example.


Solution

  • That is a namespace prefix. Because you don't include sample XML, I've made up one of my own.

    <?xml version="1.0" encoding="UTF-8"?>
    <root_elem xmlns:pcrs="http://the/pcrs/url">
    <pcrs:subelem/>
    </root_elem>
    

    xmltodict lets you manage namespaces by mapping the namespace url to a different representation. Most notably, None removes it completely. See Namespace Support.

    In your case, you can do

    data = xmltodict.parse(open('test.xml').read(),
        process_namespaces=True,
        namespaces={"http://the/pcrs/url":None})
    

    substituting the real namespace URL for http://the/pcrs/url.