Search code examples
jsonxmlapache-cameljbossfuse

Apache Camel wrong encoding after marshalling xml to json from http


I'm performing an http call to get an RSS feed from a newspaper xml feed from latin america and then transform the response body to JSON. The problem with latin american papers are newspapers is common to find latin characters that need to be encoded, such á é í ó ú.

The problem is that the response is not encoded properly so I get description like this one: Las lluvias llegar��an a la ciudad de C��rdoba jueves y viernes seg��n prev�� el Servicio Meteorol��gico Nacional (SMN)

I've tried setting encoding parameters for the http component and the xmljson marshal and neither of both work. I also tried forcing Content-Type headers for application/rss+xml; charset=utf-8 and application/json; charset=utf-8 but neither.

I'm using the following DataFormat:

<dataFormats>
  <xmljson id="xmljson"/>
</dataFormats>

And my route is as follows:

<route id="rss">
    <from uri="direct:rss"/>
    <setHeader headerName="CamelHttpUri">
        <simple>"http://srvc.lavoz.com.ar/rss.xml"</simple>
    </setHeader>
    <setHeader headerName="CamelHttpMethod">
        <constant>GET</constant>
    </setHeader>
    <to uri="http://rss"/>
    <marshal ref="xmljson"/>
</route>

An example response would be:

{
    "channel": {
    "title": "LaVoz",
    "link": "http://srvc.lavoz.com.ar/rss.xml",
    "description": [],
    "language": "en",
    "item": [
      {
        "title": "��Se vienen las lluvias a C��rdoba?",
        "link": "http://srvc.lavoz.com.ar/ciudadanos/se-vienen-las-lluvias-cordoba",
        "description": "Las lluvias llegar��an a la ciudad de C��rdoba jueves y viernes seg��n prev�� el Servicio Meteorol��gico Nacional (SMN) aunque se mantendr�� el promedio de las temperaturas.�� Este martes estuvo cielo algo nublado con una temperatura m��nima de 14�� registrada a las 6.10 y una m��xima de 29,5�� a las 15.30, seg��n indic�� el Observatorio Meteorol��gico C��rdoba.�� Pron��stico extendido Hay probabilidad de tormentas para jueves y viernes. Mir�� el pron��stico.�� Ciudadanos",
        "pubDate": "Tue, 14 Feb 2017 21:19:21 +0000",
        "dc:creator": {
          "@xmlns:dc": "http://purl.org/dc/elements/1.1/",
          "#text": "redaccionlavoz"
        },
        "guid": {
          "@isPermaLink": "false",
          "#text": "1099119 at http://srvc.lavoz.com.ar"
        }
    },...

Update:
- If the route returns the XML response (without marshalling it into JSON) the encoding works as expected.
- If instead of marshalling the route logs the body content with the XML response into a logger the problem also appears.


Solution

  • A friend was able to solve it by converting the body to String with convertBodyTo using UTF-8 before marshalling.

    The end code looks like this:

        <route id="rss">
            <from uri="direct:rss"/>
            <setHeader headerName="CamelHttpUri">
                <simple>"http://srvc.lavoz.com.ar/rss.xml"</simple>
            </setHeader>
            <setHeader headerName="CamelHttpMethod">
                <constant>GET</constant>
            </setHeader>
            <to uri="http://rss"/>
    
            <convertBodyTo type="String" charset="UTF-8" />
            <setProperty propertyName="CamelCharsetName">
            <constant>utf-8</constant>
            </setProperty>
    
            <marshal ref="xmljson"/>
        </route>