I'm trying to upload my documents of data to CloudSearch. I'm uploading the data in a file called test.json and it has the following content.
[
{
"type": "add",
"id": "1-1",
"fields": {
"id": 1,
"type": 1,
"address": "Moeboda 4",
"city": "Alvesta",
"country": "Sweden",
"rooms": 3,
"size": 45,
"price": 275000
}
}
]
I run into the following problems:
CloudSearch tells me that the only fields uploaded are:
content, content_encoding, content_type, resourcename
When I download the "Batch" that was generated I get the following data in it:
[ {
"type" : "add",
"id" : "test.json",
"fields" : {
"content" : "[\r\n\t{\r\n\t\t\"type\": \"add\", \r\n\t\t\"id\": \"1-1\", \r\n\t\t\"fields\": {\r\n\t\t\t\"id\": 1,\r\n\t\t\t\"type\": 1,\r\n\t\t\t\"address\": \"Moeboda 4\",\r\n\t\t\t\"city\": \"Alvesta\",\r\n\t\t\t\"country\": \"Sweden\",\r\n\t\t\t\"rooms\": 3,\r\n\t\t\t\"size\": 45,\r\n\t\t\t\"price\": 275000\r\n\t\t}\r\n\t}\r\n]",
"resourcename" : "test.json",
"content_encoding" : "UTF-8",
"content_type" : "application/json"
}
} ]
So what I'm guessing is that AWS CloudSearch think my JSON is a string, so it creates a new file with its now fields (content, resourcename, content_encoding, content_type) and populates it with my "string" and then also escapes it since the strings should be escaped.
I have no idea why this is happening and I have been working on it for hours. I've been trying .txt files, .json, changing charsets, removing brackets and so on but nothing works.
And yes, I have set Index Options where I have set all the fields I'm trying to upload. See screenshot:
This issue was related to the character encoding of the .py file. When "forced" the save to be UTF-8 it worked. I think my editor had saved it as "UTF-8 with BOM".
So if you run into this issue. Tripple check your encodings and charsets.