I wouldlike to remove some character when I try to convert my xml to dict :
data = xmltodict.parse(open('test.xml').read())
with open('test2.json', "wt", encoding='utf-8', errors='ignore') as f:
json.dump(data, f, indent=4, sort_keys=True)
return data
The problem actually i have many json file some json file like this :
{
"pcrs:test A": {
"pcrs:nature": "03",
"pcrs:producteur": "SIEML"
}}
And some json file like this(without pcrs) :
{
"test B": {
"nature": "03",
"producteur": "SIEML",
}}
How to force any file like the first example to be without 'pcrs:'
as the seconde example.
That is a namespace prefix. Because you don't include sample XML, I've made up one of my own.
<?xml version="1.0" encoding="UTF-8"?>
<root_elem xmlns:pcrs="http://the/pcrs/url">
<pcrs:subelem/>
</root_elem>
xmltodict
lets you manage namespaces by mapping the namespace url to a different representation. Most notably, None
removes it completely. See Namespace Support.
In your case, you can do
data = xmltodict.parse(open('test.xml').read(),
process_namespaces=True,
namespaces={"http://the/pcrs/url":None})
substituting the real namespace URL for http://the/pcrs/url
.