What are Bandwidth's SMS character limits and concatenation practices?Follow
What is the character limit for SMS text messages?
The maximum number of characters per single message being sent to carriers depends on the encoding used, while the encoding used depends on the content of the message. The HTTP API limits SMS messages to a maximum length of 2048 characters.
See this support article for types of encoding. For example:
- A message containing text characters only will be encoded using GSM-7
- A message containing emojis will be encoded using UCS-2
Here's the maximum number of characters that can be sent to carriers in a single SMS segment:
|Message||Type||Characters used in the message||Encoding||Max characters/ message(without UDH)|
|Hello - good morning||Text||GSM Standard||GSM-7||160|
How does Bandwidth send concatenated (long) messages?
Depending on the message content (plain text, emojis, special characters, etc.), Bandwidth will use either the GSM-7 or UCS-2 encoding to send it. Each encoding has limitations to the number of characters that can be sent:
- 160 characters for GSM-7 (e.g. Latin-1/9 and GSM8).
- 70 characters for UCS-2 (e.g. message with emojis)
When you send a message whose text is longer than the maximum number of characters per segment, Bandwidth will automatically split the message for you, add a special header (User Data Header), and send multiple SMS to carriers.
What is a User Data Header?
The User Data Header (UDH) takes up 6 bytes and instructs the receiving device how to reassemble the segments so that your whole message will be shown as one SMS on the receiving handset. The maximum number of characters per concatenated (long) message is slightly reduced due to the inclusion of concatenation headers (UDH).
Can you explain the math around characters and segments?
- SMS messages are sent in 140 bytes
- 1 byte = 8 bits
- In GSM encoding, 1 character = 7 bits
- In Unicode, 1 character = 16 bits
- UDH = 6 bytes
|Message Type||Calculation||Max Characters per Segment|
|GSM Single Segment||(140 bytes x 8 bits) / 7 bits||160 characters|
|GSM Multi-Segment||((140 bytes - 6 bytes) x 8 bits) / 7 bits||153 characters|
|Unicode Single Segment||(140 bytes x 8 bits) / 16 bits||70 characters|
|Unicode Multi-Segment||((140 bytes - 6 bytes) x 8 bits) / 16 bits||67 characters|
Bandwidth (and the industry as a whole) counts messages by segments; therefore, customers will be charged for each individual message segment sent to downstream carriers. For example:
|Characters used in a message||Total number of characters||Encoding||Message Segments||Calculation|
|Text Only||160||GSM-7||1||No UDH is required, all 160 characters are available|
|Text Only||240||GSM-7||2||153+87=240 characters|
|Text and emojis||150||UCS-2||3||67+67+16=150 characters|
Why does my message look like GSM characters but is being split as a UCS message?
The most common sneaky characters we see are the "smart quote" (U+2019 to U+0027) as a result of text editors trying to be helpful and "white space" characters (U+2002 to U+0020) that typically surface when copying and pasting.