I am using jackson-dataformat-xml (2.9) to parse an XML into JsonNode and then parse it to JSON (the XML is very dynamic so that is why I am using JsonNode instead of binding to a POJO. e.g 'elementName' and 'id' names may vary).
It happens that during the JSON parsing phase, one of the element keys is empty string ("").
XML:
<elementName>
<id type="pid">abcdef123</id>
</elementName>
Parsing logic:
public Parser() {
ObjectMapper jsonMapper = new ObjectMapper();
XmlMapper xmlMapper = new XmlMapper(new XmlFactory(new WstxInputFactory()));
}
public InputStream parseXmlResponse(InputStream xmlStream) {
InputStream stream = null;
try {
JsonNode node = xmlMapper.readTree(xmlStream);
stream = new ByteArrayInputStream(jsonMapper.writer().writeValueAsBytes(node));
} catch (IOException e) {
e.printStackTrace();
}
return stream;
}
Json:
Result:
{
"elementName": {
"id": {
"type": "pid",
"": "abcdef123"
}
},
}
Expected:
{
"elementName": {
"id": {
"type": "pid",
"value": "abcdef123"
}
},
}
My idea is to find whenever I have the empty key "" and replace it with "value". Either at XML de-serialization or during JSON serialization. I have tried to use default serializer, filter, but haven't got it working in a nice and concise way.
Suggestions are much appreciated.
Thank you for the help.
Based on @shoek suggestion I decided to write a custom serializer to avoid creating an intermediate object (ObjectNode) during the process.
edit: refactor based on the same solution proposed by @shoek.
public class CustomNode {
private JsonNode jsonNode;
public CustomNode(JsonNode jsonNode) {
this.jsonNode = jsonNode;
}
public JsonNode getJsonNode() {
return jsonNode;
}
}
public class CustomObjectsResponseSerializer extends StdSerializer<CustomNode> {
protected CustomObjectsResponseSerializer() {
super(CustomNode.class);
}
@Override
public void serialize(CustomNode node, JsonGenerator jgen, SerializerProvider provider) throws IOException {
convertObjectNode(node.getJsonNode(), jgen, provider);
}
private void convertObjectNode(JsonNode node, JsonGenerator jgen, SerializerProvider provider) throws IOException {
jgen.writeStartObject();
for (Iterator<String> it = node.fieldNames(); it.hasNext(); ) {
String childName = it.next();
JsonNode childNode = node.get(childName);
// XML parser returns an empty string as value name. Replacing it with "value"
if (Objects.equals("", childName)) {
childName = "value";
}
if (childNode instanceof ArrayNode) {
jgen.writeFieldName(childName);
convertArrayNode(childNode, jgen, provider);
} else if (childNode instanceof ObjectNode) {
jgen.writeFieldName(childName);
convertObjectNode(childNode, jgen, provider);
} else {
provider.defaultSerializeField(childName, childNode, jgen);
}
}
jgen.writeEndObject();
}
private void convertArrayNode(JsonNode node, JsonGenerator jgen, SerializerProvider provider) throws IOException {
jgen.writeStartArray();
for (Iterator<JsonNode> it = node.elements(); it.hasNext(); ) {
JsonNode childNode = it.next();
if (childNode instanceof ArrayNode) {
convertArrayNode(childNode, jgen, provider);
} else if (childNode instanceof ObjectNode) {
convertObjectNode(childNode, jgen, provider);
} else {
provider.defaultSerializeValue(childNode, jgen);
}
}
jgen.writeEndArray();
}
}
You also could simply post-process the JSON DOM, traverse to all objects, and rename the keys that are empty strings to "value".
Race condition: such a key may already exist, and must not be overwritten
(e.g. <id type="pid" value="existing">abcdef123</id>
).
Usage:
(note: you should not silently suppress the exception and return null, but allow it to propagate so the caller can decide to catch and apply failover logic if required)
public InputStream parseXmlResponse(InputStream xmlStream) throws IOException {
JsonNode node = xmlMapper.readTree(xmlStream);
postprocess(node);
return new ByteArrayInputStream(jsonMapper.writer().writeValueAsBytes(node));
}
Post-processing:
private void postprocess(JsonNode jsonNode) {
if (jsonNode.isArray()) {
ArrayNode array = (ArrayNode) jsonNode;
Iterable<JsonNode> elements = () -> array.elements();
// recursive post-processing
for (JsonNode element : elements) {
postprocess(element);
}
}
if (jsonNode.isObject()) {
ObjectNode object = (ObjectNode) jsonNode;
Iterable<String> fieldNames = () -> object.fieldNames();
// recursive post-processing
for (String fieldName : fieldNames) {
postprocess(object.get(fieldName));
}
// check if an attribute with empty string key exists, and rename it to 'value',
// unless there already exists another non-null attribute named 'value' which
// would be overwritten.
JsonNode emptyKeyValue = object.get("");
JsonNode existing = object.get("value");
if (emptyKeyValue != null) {
if (existing == null || existing.isNull()) {
object.set("value", emptyKeyValue);
object.remove("");
} else {
System.err.println("Skipping empty key value as a key named 'value' already exists.");
}
}
}
}
Output: just as expected.
{
"elementName": {
"id": {
"type": "pid",
"value": "abcdef123"
}
},
}
EDIT: considerations on performance:
I did a test with a large XML file (enwikiquote-20200520-pages-articles-multistream.xml
, en.wikiquote XML dump, 498.4 MB), 100 rounds, with following measured times (using deltas with System.nanoTime()
):
JsonNode node = xmlMapper.readTree(xmlStream);
)postprocess(node);
)new ByteArrayInputStream(jsonMapper.writer().writeValueAsBytes(node));
)That's a fraction of a millisecond for an object tree build from a ~500 MB file - so performance is excellent and no concern.