Context:
Amazon SQS has a constraint on ranges of characters it will accept when a message passed in the argument to the sqsClient.sendMessage(...)
.
(Mentioned here).
Exerpt from the above link:
A message can include only XML, JSON, and unformatted text. The following Unicode characters are allowed:
#x9 | #xA | #xD | #x20 to #xD7FF | #xE000 to #xFFFD | #x10000 to #x10FFFF
Any characters not included in this list will be rejected.
Question:
For now, we know offending characters are present in the message json which is sent as a message, so we filter them out by
message_json.replaceAll("\uffff", "");
and this works fine. (where '\uffff' is the java representation of the xFFFF/U+FFFF character).
However, instead of only doing for the xFFFF character, I want to do this for the entire ranges mentioned above(#x9 | #xA | #xD | #x20 to #xD7FF | #xE000 to #xFFFD | #x10000 to #x10FFFF
) but how do I construct a clause that can take range of characters without running replace on each one?
Actually, the answer was right in front of me. For some reason, I had assumed that the character classes of a regex will not accept these escaped chars such as [\ufffd-\uffff]
inside message_json.replaceAll("[\ufffd-\uffff]", " ");
This works for my case.