I'm using Twilio's SMS service, and I want to be able to send ordinary non-English Roman-alphabet characters (for European personal names) along with ASCII characters. The characters I need are a subset of Unicode's "Latin-1 Supplement Block". And, they're all in the GSM-7 character set. But they show up on handsets as replacement characters. For example when I send J'aime l'été... éÉÑñ
the phone shows J'aime l'?t?... ????
.
I'm testing with a USA iPhone with iOS 13 running on Sprint. Verizon iPhones show the same problem.
Here's C# code reproducing the problem. Changing the value of smartEncoded
from true
to false
or vice versa makes no difference
const string sid = "REDACTED";
const string token = "REDACTED";
const string from = "REDACTED";
const string to = "REDACTED";
const string message = "J'aime l'été... éÉÑñ";
TwilioClient.Init(sid, token);
var msg = MessageResource.Create(
body: message,
from: new Twilio.Types.PhoneNumber(from),
to: new Twilio.Types.PhoneNumber(to),
smartEncoded: true
);
Twilio claims they use GSM-7 to send messages whenever they can use that character set, and fall back to using UCS-2 when they can't.
If I send a message that forces Twilio to use UCS-2 encoding everything works fine. For example, appending ®
does the trick. Of course each SMS message sent in UCS-2 has a shorter maximum length.
const string message = "J'aime l'été... éÉÑñ ®";
I must be missing something; Twilio is proud of their message-size optimization feature. How can I fix this?
tl;dr: known issue with some short message service centers may be solved by forcing the message into Unicode (at no additional cost) or by forcing at least two segments (additional cost because you're charged per segment and you'll have more segments).
I asked Twilio Support about the same/a similar issue I am experiencing on Verizon when my GSM-encoded message contains "extended GSM characters" and received the following response:
There is a known encoding issue with messages that are routed to certain Verizon short message service centers. Verizon has many SMSCs (short message service centers) and they are dynamically assigned (i.e. one particular Verizon user may get messages through many different SMSCs at varying times).
Is this issue you are seeing occurring specifically for single-segment messages? If so, it matches an encoding issue we have seen on certain Verizon SMSC's that has been present since 2018. For single-segment SMS, certain Verizon SMSCs may convert "ñ" or other extended GSM characters ((à ò è ì ù ¿ Ñ ñ ¡) to "?" upon delivery.
Unfortunately, we do not have additional device-related specifics, aside from the fact that this impacts some Verizon SMSCs.
To avoid this issue, we recommend the following:
Option 1: Force the message to be sent as Unicode by including a non-GSM character. To send the message in unicode without adding extra cost by sending >160 characters you can include the Punctuation Space with the message. It looks just like a regular space, but because it's outside of the regular GSM-7 range it will cause the entire message to be converted to Unicode and the accents will come through correctly. Would you be able to test this as a workaround? Please note that USC2 messages will be limited to 70 chars per segment.
Option 2: Ensure the message is more than 1 segment long in GSM encoding.
Per https://www.twilio.com/docs/sms/services/smart-encoding-char-list the "Punctuation Space" is U+2008
.
Note that Twilio charges per segment, so if you choose option 2 you'll be paying more because you'll be sending more segments.
Twilio offers this tool for you to understand what encoding will be used for your message and how many segments it will require.