json amazon-web-services amazon-cloudsearch

Upload Document to CloudSearch, treats my JSON as a string

I'm trying to upload my documents of data to CloudSearch. I'm uploading the data in a file called test.json and it has the following content.

[
    {
        "type": "add", 
        "id": "1-1", 
        "fields": {
            "id": 1,
            "type": 1,
            "address": "Moeboda 4",
            "city": "Alvesta",
            "country": "Sweden",
            "rooms": 3,
            "size": 45,
            "price": 275000
        }
    }
]

I run into the following problems:

CloudSearch tells me that the only fields uploaded are: content, content_encoding, content_type, resourcename

When I download the "Batch" that was generated I get the following data in it:

[ {
    "type" : "add",
    "id" : "test.json",
    "fields" : {
        "content" : "[\r\n\t{\r\n\t\t\"type\": \"add\", \r\n\t\t\"id\": \"1-1\", \r\n\t\t\"fields\": {\r\n\t\t\t\"id\": 1,\r\n\t\t\t\"type\": 1,\r\n\t\t\t\"address\": \"Moeboda 4\",\r\n\t\t\t\"city\": \"Alvesta\",\r\n\t\t\t\"country\": \"Sweden\",\r\n\t\t\t\"rooms\": 3,\r\n\t\t\t\"size\": 45,\r\n\t\t\t\"price\": 275000\r\n\t\t}\r\n\t}\r\n]",
        "resourcename" : "test.json",
        "content_encoding" : "UTF-8",
        "content_type" : "application/json"
    }
} ]

So what I'm guessing is that AWS CloudSearch think my JSON is a string, so it creates a new file with its now fields (content, resourcename, content_encoding, content_type) and populates it with my "string" and then also escapes it since the strings should be escaped.

I have no idea why this is happening and I have been working on it for hours. I've been trying .txt files, .json, changing charsets, removing brackets and so on but nothing works.

And yes, I have set Index Options where I have set all the fields I'm trying to upload. See screenshot:

enter image description here

Solution

This issue was related to the character encoding of the .py file. When "forced" the save to be UTF-8 it worked. I think my editor had saved it as "UTF-8 with BOM".

So if you run into this issue. Tripple check your encodings and charsets.