What is SMS messaging encoding and why is it important?
Encoding is the process of converting data from one form to another so that it can be processed, transmitted, and translated uniformly. When you input text using a keyboard or in some other way, the character encoding maps characters you choose to specific bytes in computer memory, and then to display the text, it reads the bytes back into characters.
If you use anything other than the most basic English text, people may not be able to read the content you create unless you say what character encoding you used. For example, you may intend the text to look like this:
but it may actually display like this:
In the context of messaging, encoding is important so that the message is accurately transferred and rendered as intended.
What encoding standards does Bandwidth support and how does it work?
Bandwidth supports both GSM-7 and UCS-2.
- GSM-7 is a character encoding standard used for commonly used letters and symbols in many languages.
- It uses 7 bits to send a single character/symbol on GSM networks. As SMS messages are transmitted as 140 8-bit octets at a time, GSM-7 encoded SMS messages can carry up to 160 characters (140*8/7=160).
- You can view the basic character set for GSM-7 here.
- UCS-2 is a character encoding standard used if a message cannot be encoded using GSM-7 or when a language requires more than 128 characters to be rendered.
- It uses a fixed length of 16 bits (2 bytes) to send a single character. As SMS messages are transmitted as 140 8-bit octets at a time, UCS-2 encoded messages can carry up to 70 characters {(140*8) / (2*8) = 70}.
- UCS-2 can encode anything in the Basic Multilingual Plane.
Do Bandwidth messaging APIs have Unicode support for all languages?
Yes, Bandwidth uses UCS-2 which can encode anything in the Basic Multilingual Plane. UCS-2 represents a possible maximum of 65,536 characters, or in hexadecimals from 0000h - FFFFh (2 bytes).
How does Bandwidth encode its SMS messages?
When sending SMS messages, Bandwidth will automatically send messages in the most compact encoding possible. If there are any non GSM-7 characters in your message body, Bandwidth will automatically change to UCS-2 encoding (which will limit message bodies to ~70 characters each depending on the number of segments).
For concatenated messages, Bandwidth prepends a User Data Header of 6 Bytes (the UDH instructs the receiving device on how to assemble messages), leaving 153 GSM-7 characters or 67 UCS-2 characters for your message.
Encoding for SMS depends on the platform you use:
Platform | Outbound (MT) | Inbound (MO) |
SMPP | Bandwidth passes through the encoding type present in the message | Bandwidth passes through the encoding type present in the message |
HTTP | Bandwidth uses GSM 8-bit encoding, or UCS-2 if the message text contains non-ASCII characters | Message text will always be in JSON format with UTF-8 encoding |
How does Bandwidth encode its MMS messages?
Encoding for MMS depends on the platform you use:
Platform | Outbound (MT) | Inbound (MO) |
MM4 | Bandwidth passes through the encoding type present in the message | Bandwidth passes through the encoding type present in the message |
HTTP | All messages are sent and received using UTF-8 encoding | All messages are sent and received using UTF-8 encoding |
Questions? Please open a ticket with your Bandwidth Support Team or hit us up at (855) 864-7776!
Article is closed for comments.