What is SMS Messaging Encoding and Why is it Important?

Follow

Chase Greiser

Updated

Encoding is the process of converting data from one form to the other so that it can be processed , transmitted and translated uniformly.  When you input text using a keyboard or in some other way, the character encoding maps characters you choose to specific bytes in computer memory, and then to display the text it reads the bytes back into characters.

If you use anything other than the most basic English text, people may not be able to read the content you create unless you say what character encoding you used. For example, you may intend the text to look like this:

image4.png

but it may actually display like this:

image3.png

In the context of messaging , encoding is important so that the message is accurately transferred and rendered as intended.

 

What types of encoding does Bandwidth support and how does it work?

Bandwidth supports both GSM-7 and UCS-2.

  • GSM-7 is a character encoding standard used for commonly used letters and symbols in many languages.
    • It uses 7 bits to send a single character/symbol on GSM networks.  As SMS messages are transmitted as 140 8-bit octets at a time, GSM-7 encoded SMS messages can carry up to 160 characters (140*8/7=160).
    • Refer here for the basic character set for GSM-7.
  • UCS-2 is a character encoding standard used if a message cannot be encoded using GSM-7 or when a language requires more than 128 characters to be rendered.
    • It uses a fixed-length 16 bits (2 bytes) to send a single character.  As SMS messages are transmitted as 140 8-bit octets at a time, UCS-2 encoded messages can carry upto 70 characters {(140*8) / (2*8) = 70}.
    • UCS-2 can encode anything in the Basic Multilingual Plane.

Do Bandwidth messaging APIs have Unicode support for all languages?

Yes, Bandwidth uses UCS-2 which can encode anything in the Basic Multilingual Plane.

UCS-2 represents a possible maximum of 65,536 characters, or in hexadecimals from 0000h - FFFFh (2 bytes).  

 

How does Bandwidth encode its SMS messages?

When sending SMS messages, Bandwidth will automatically send messages in the most compact encoding possible.  If there are any non GSM-7 characters in your message body, Bandwidth will automatically change to UCS-2 encoding (which will limit message bodies to 70 characters each).  

For concatenated messages, Bandwidth prepends a User Data Header of 6 Bytes (the UDH instructs the receiving device on how to assemble messages), leaving 153 GSM-7 characters or 67 UCS-2 characters for your message.

Encoding for SMS depends on the platform used -

Platform

Outbound (MT)

Inbound (MO)

SMPP

Bandwidth passes through the encoding type present in the message

Bandwidth passes through the encoding type present in the message

HTTP (using V1 API)

Bandwidth will select the most compact encoding that can represent the message text. The set of possible encodings, from smallest to largest is: GSM → Latin1→ Cyrillic → Hebrew → UCS2

Message text will always be in JSON format with UTF-8 encoding

HTTP (using V2 API)

Bandwidth uses GSM 8-bit encoding, or UCS-2 if the message text contains non-ASCII characters

Message text will always be in JSON format with UTF-8 encoding

 

How does Bandwidth encode its MMS messages?

Encoding for MMS depends on the platform used -

Platform

Outbound (MT)

Inbound (MO)

MM4

Bandwidth passes through the encoding type present in the message

Bandwidth passes through the encoding type present in the message

HTTP (using V1 API)

All messages are sent and received using UTF-8 encoding

All messages are sent and received using UTF-8 encoding

HTTP (using V2 API)

All messages are sent and received using UTF-8 encoding

All messages are sent and received using UTF-8 encoding

 

Article is closed for comments.