Foreign Language Support
Text written in a foreign language like Arabic, Thai or Chinese contains characters, which are normally not part of the fonts shipped with emWin.
This chapter explains the basics like the Unicode standard, which defines all available characters worldwide and the UTF-8 encoding scheme, which is used by emWin to decode text with Unicode characters.
It also explains how to enable Arabic language support and how to render text with Shift-JIS (Japanese Industry Standard) encoding.
Unicode
The Unicode standard is a 16-bit character encoding scheme. All of the characters available worldwide are in a single 16-bit character set (which works globally). The Unicode standard is defined by the Unicode consortium.
emWin can display individual characters or strings in Unicode, although it is most common to simply use mixed strings, which can have any number of Unicode sequences within one ASCII string.
UTF-8 encoding
ISO/IEC 10646-1 defines a multi-octet character set called the Universal Character Set (UCS) which encompasses most of the world's writing systems. Multi-octet characters, however, are not compatible with many current applications and protocols, and this has led to the development of a few UCS transformation formats (UTF), each with different characteristics.
UTF-8 has the characteristic of preserving the full ASCII range, providing compatibility with file systems, parsers and other software that rely on ASCII values but are transparent to other values.
In emWin, UTF-8 characters are encoded using sequences of 1 to 3 octets. If the high-order bit is set to 0, the remaining 7 bits being used to encode the character value. In a sequence of n octets, n>1, the initial octet has the n higher-order bits set to 1, followed by a bit set to 0. The remaining bit(s) of that octet contain bits from the value of the character to be encoded. The following octet(s) all have the higher-order bit set to 1 and the following bit set to 0, leaving 6 bits in each to contain bits from the character to be encoded. The following table shows the encoding ranges:
| Character range | UTF-8 Octet sequence |
|---|---|
| 0000 - 007F | 0xxxxxxx |
| 0080 - 07FF | 110xxxxx 10xxxxxx |
| 0800 - FFFF | 1110xxxx 10xxxxxx 10xxxxxx |
Encoding example
The text "Halöle" contains ASCII characters and European extensions. The following hexdump shows this text as UTF-8 encoded text:
48 61 6C C3 B6 6C 65
Programming examples
If we want to display a text containing non-ASCII characters, we can do this by manually computing the UTF-8 codes for the non-ASCII characters in the string. However, if your compiler supports UTF-8 encoding (Sometimes called multi-byte encoding), even non-ASCII characters can be used directly in strings.
//
// Example using ASCII encoding:
//
GUI_UC_SetEncodeUTF8(); /* required only once to activate UTF-8*/
GUI_DispString("Hal\xc3\xb6le");
//
// Example using UTF-8 encoding:
//
GUI_UC_SetEncodeUTF8(); /* required only once to activate UTF-8*/
GUI_DispString("Halöle"); Unicode characters
The character output routine used by emWin ( GUI_DispChar() ) does always take an unsigned 16-bit value (U16) and has the basic ability to display a character defined by Unicode. It simply requires a font which contains the character you want to display.
UTF-8 strings
This is the most recommended way to display Unicode. You do not have to use special functions to do so. If UTF-8-encoding is enabled each function of emWin which handles with strings decodes the given text as UTF-8 text.
Using U2C.exe to convert UTF-8 text into C-code
The Tool subdirectory of emWin contains the tool U2C.exe to convert UTF-8 text to C-code. It reads an UTF-8 text file and creates a C-file with C-strings. The following steps show how to convert a text file into C-strings and how to display them with emWin:
Step 1: Creating a UTF-8 text file
Save the text to be converted in UTF-8 format. You can use Notepad.exe to do this. Load the text under Notepad.exe:
Choose "File/Save As...". The file dialog should contain a combo box to set the encoding format. Choose "UTF-8" and save the text file.
Step 2: Converting the text file into a C-code file
Start U2C.exe . After starting the program you need to select the text file to be converted. After selecting the text file the name of the C-file should be selected. Output of U2C.exe :
"Japanese:" "1 - \xe3\x82\xa8\xe3\x83\xb3\xe3\x82\xb3\xe3\x83\xbc "\xe3\x83\x87\xe3\x82\xa3\xe3\x83\xb3\xe3\x82\xb0" "2 - \xe3\x83\x86\xe3\x82\xad\xe3\x82\xb9\xe3\x83\x88" "3 - \xe3\x82\xb5\xe3\x83\x9d\xe3\x83\xbc\xe3\x83\x88" "English:" "1 - encoding" "2 - text" "3 - support"
Step 3: Using the output in the application code
The following example shows how to display the UTF-8 text with emWin:
#include "GUI.h"
static const char * _apStrings[] = {
"Japanese:",
"1 - \xe3\x82\xa8\xe3\x83\xb3\xe3\x82\xb3\xe3\x83\xbc"
"\xe3\x83\x87\xe3\x82\xa3\xe3\x83\xb3\xe3\x82\xb0",
"2 - \xe3\x83\x86\xe3\x82\xad\xe3\x82\xb9\xe3\x83\x88",
"3 - \xe3\x82\xb5\xe3\x83\x9d\xe3\x83\xbc\xe3\x83\x88",
"English:",
"1 - encoding",
"2 - text",
"3 - support"
};
void MainTask(void) {
int i;
GUI_Init();
GUI_SetFont(&GUI_Font16_1HK);
GUI_UC_SetEncodeUTF8();
for (i = 0; i < GUI_COUNTOF(_apStrings); i++) {
GUI_DispString(_apStrings[i]);
GUI_DispNextLine();
}
while(1) {
GUI_Delay(500);
}
}
Unicode API
The table below lists the available routines in alphabetical order within their respective categories. Detailed descriptions of the routines can be found in the sections that follow.
| Routine | Description |
|---|---|
| UTF-8 functions | |
| GUI_UC_ConvertUC2UTF8 | Converts a Unicode string into UTF-8 format. |
| GUI_UC_ConvertUTF82UC | Cibverts a UTF-8 string into Unicode format. |
| GUI_UC_EnableBIDI | Enables/Disables the support for bidirectional fonts. |
| GUI_UC_Encode | Encodes the given character with the current encoding. |
| GUI_UC_GetCharCode | Returns the decoded character. |
| GUI_UC_GetCharSize | Returns the number of bytes used to encode the given character. |
| GUI_UC_SetEncodeNone | Disables encoding. |
| GUI_UC_SetEncodeUTF8 | Enables UTF-8 encoding. |
| Double byte functions | |
| GUI_UC_DispString | Displays a double byte string. |
Arabic language support

The basic difference between western languages and Arabic is, that Arabic is written from the right to the left and that it does not know uppercase and lowercase characters. Further the character codes of the text are not identical with the character index in the font file used to render the character, because the notation forms of the characters depend on the positions in the text.
Notation forms
The Arabic base character set is defined in the Unicode standard within the range from 0x0600 to 0x06FF. Unfortunately these character codes can not directly be used to get the character of the font for drawing it, because the notation form depends on the character position in the text. One character can have up to 4 different notation forms:
- One, if it is at the beginning of a word (initial)
- One, if it is at the end of a word (final)
- One, if it is in the middle of a word (medial)
- One, if the character stands alone (isolated)
But not each character is allowed to be joined to the left and to the right (double-joined). The character 'Hamza' for example always needs to be separated and 'Alef' is only allowed at the end or separated. Character combinations of the letters 'Lam' and 'Alef' should be transformed to a 'Ligature'. This means one character substitutionally for the combination of 'Lam' and 'Alef'.
The above explanation shows, that the notation form is normally not identically with the character code of the text. The following table shows how emWin transforms the characters to the notation form in dependence of the text position:
| Base | Isolated | Final | Initial | Medial | Character |
|---|---|---|---|---|---|
| 0x0621 | 0xFE80 | - | - | - | Hamza |
| 0x0622 | 0xFE81 | 0xFE82 | - | - | Alef with Madda above |
| 0x0623 | 0xFE83 | 0xFE84 | - | - | Alef with Hamza above |
| 0x0624 | 0xFE85 | 0xFE86 | - | - | Waw with Hamza above |
| 0x0625 | 0xFE87 | 0xFE88 | - | - | Alef with Hamza below |
| 0x0626 | 0xFE89 | 0xFE8A | 0xFE8B | 0xFE8C | Yeh with Hamza above |
| 0x0627 | 0xFE8D | 0xFE8E | - | - | Alef |
| 0x0628 | 0xFE8F | 0xFE90 | 0xFE91 | 0xFE92 | Beh |
| 0x0629 | 0xFE93 | 0xFE94 | - | - | Teh Marbuta |
| 0x062A | 0xFE95 | 0xFE96 | 0xFE97 | 0xFE98 | Teh |
| 0x062B | 0xFE99 | 0xFE9A | 0xFE9B | 0xFE9C | Theh |
| 0x062C | 0xFE9D | 0xFE9E | 0xFE9F | 0xFEA0 | Jeem |
| 0x062D | 0xFEA1 | 0xFEA2 | 0xFEA3 | 0xFEA4 | Hah |
| 0x062E | 0xFEA5 | 0xFEA6 | 0xFEA7 | 0xFEA8 | Khah |
| 0x062F | 0xFEA9 | 0xFEAA | - | - | Dal |
| 0x0630 | 0xFEAB | 0xFEAC | - | - | Thal |
| 0x0631 | 0xFEAD | 0xFEAE | - | - | Reh |
| 0x0632 | 0xFEAF | 0xFEB0 | - | - | Zain |
| 0x0633 | 0xFEB1 | 0xFEB2 | 0xFEB3 | 0xFEB4 | Seen |
| 0x0634 | 0xFEB5 | 0xFEB6 | 0xFEB7 | 0xFEB8 | Sheen |
| 0x0635 | 0xFEB9 | 0xFEBA | 0xFEBB | 0xFEBC | Sad |
| 0x0636 | 0xFEBD | 0xFEBE | 0xFEBF | 0xFEC0 | Dad |
| 0x0637 | 0xFEC1 | 0xFEC2 | 0xFEC3 | 0xFEC4 | Tah |
| 0x0638 | 0xFEC5 | 0xFEC6 | 0xFEC7 | 0xFEC8 | Zah |
| 0x0639 | 0xFEC9 | 0xFECA | 0xFECB | 0xFECC | Ain |
| 0x063A | 0xFECD | 0xFECE | 0xFECF | 0cFED0 | Ghain |
| 0x0641 | 0xFED1 | 0xFED2 | 0xFED3 | 0xFED4 | Feh |
| 0x0642 | 0xFED5 | 0xFED6 | 0xFED7 | 0xFED8 | Qaf |
| 0x0643 | 0xFED9 | 0xFEDA | 0xFEDB | 0xFEDC | Kaf |
| 0x0644 | 0xFEDD | 0xFEDE | 0xFEDF | 0xFEE0 | Lam |
| 0x0645 | 0xFEE1 | 0xFEE2 | 0xFEE3 | 0xFEE4 | Meem |
| 0x0646 | 0xFEE5 | 0xFEE6 | 0xFEE7 | 0xFEE8 | Noon |
| 0x0647 | 0xFEE9 | 0xFEEA | 0xFEEB | 0xFEEC | Heh |
| 0x0648 | 0xFEED | 0xFEEE | - | - | Waw |
| 0x0649 | 0xFEEF | 0xFEF0 | - | - | Alef Maksura |
| 0x064A | 0xFEF1 | 0xFEF2 | 0xFEF3 | 0xFEF4 | Yeh |
| 0x067E | 0xFB56 | 0xFB57 | 0xFB58 | 0xFB59 | Peh |
| 0x0686 | 0xFB7A | 0xFB7B | 0xFB7C | 0xFB7D | Tcheh |
| 0x0698 | 0xFB8A | 0xFB8B | - | - | Jeh |
| 0x06A9 | 0xFB8E | 0xFB8F | 0xFB90 | 0xFB91 | Keheh |
| 0x06AF | 0xFB92 | 0xFB93 | 0xFB94 | 0xFB95 | Gaf |
| 0x06CC | 0xFBFC | 0xFBFD | 0xFBFE | 0xFBFF | Farsi Yeh |
Ligatures
Character combinations of 'Lam' and 'Alef' needs to be transformed to ligatures. The following table shows how emWin transforms these combinations into ligatures, if the first letter is a 'Lam' (code 0x0644):
| Second letter | Ligature (final) | Ligature (elswhere) |
|---|---|---|
| 0x0622, Alef with Madda above | 0xFEF6 | 0xFEF5 |
| 0x0623, Alef with Hamza above | 0xFEF8 | 0xFEF7 |
| 0x0625, Alef with Hamza below | 0xFEFA | 0xFEF9 |
| 0x0627, Alef | 0xFEFC | 0xFEFB |
Bidirectional text alignment
As mentioned above Arabic is written from the right to the left (RTL). But if for example the Arabic text contains numbers build of more than one digit these numbers should be written from left to right. And if Arabic text is mixed with European text a couple of further rules need to be followed to get the right visual alignment of the text.
The Unicode consortium has defined these rules in the Unicode standard. If bidirectional text support is enabled, emWin follows up most of these rules to get the right visual order before drawing the text. emWin also supports mirroring of neutral characters in RTL aligned text. This is important if for example Arabic text contains parenthesis. The mirroring is done by replacing the code of the character to be mirrored with the code of a mirror partner whose image fits to the mirrored image. This is done by a fast way using a table containing all characters with existing mirror partners. Note that support for mirroring further characters is not supported.
The following example shows how bidirectional text is rendered by emWin:
| UTF-8 text | Rendering |
|---|---|
| \xd8\xb9\xd9\x84\xd8\xa7 1, 2, 345 \xd8\xba\xd9\x86\xd9\x8a XYZ \xd8\xa3\xd9\x86\xd8\xa7 |
Requirements
Arabic language support is part of the emWin basic package. emWin standard fonts do not contain Arabic characters. Font files containing Arabic characters can be created using the Font Converter.
Memory
The bidirectional text alignment and Arabic character transformation uses app. 60 KB of ROM and app. 800 bytes of additional stack.
How to enable Arabic support
Per default emWin writes text always from the left to the right and there will be no Arabic character transformation as described above. To enable support for bidirectional text and Arabic character transformation, add the following line to your application:
GUI_UC_EnableBIDI(1);
If enabled, emWin follows the rules of the bidirectional algorithm, described by the Unicode consortium, to get the right visual order before drawing text.
Example
The Sample folder contains the example FONT_Arabic , which shows how to draw Arabic text. It contains an emWin font with Arabic characters and some small Arabic text examples.
Font files used with Arabic text
Font files used to render Arabic languages need to include at least all characters defined in the Â’ArabicÂ’ range 0x600-0x6FF and the notation forms and ligatures listed in the tables of this chapter.
Thai language support

The Thai alphabet uses 44 consonants and 15 basic vowel characters. These are horizontally placed, left to right, with no intervening space, to form syllables, words, and sentences. Vowels are written above, below, before, or after the consonant they modify, although the consonant always sounds first when the syllable is spoken. The vowel characters (and a few consonants) can be combined in various ways to produce numerous compound vowels (diphthongs and triphthongs).
Requirements
As explained above the Thai language makes an extensive usage of compound characters. To be able to draw compound characters in emWin, a new font type is needed, which contains all required character information like the image size, image position and cursor incrementation value. From version 4.00 emWin supports a new font type with this information. This also means that older font types can not be used to draw Thai text. Note that the standard fonts of emWin does not contain font files with Thai characters. To create a Thai font file, the font converter of version 3.04 or newer is required. Memory The Thai language support needs no additional ROM or RAM.
How to enable Thai support
Thai support does not need to be enabled by a configuration switch. The only thing required to draw Thai text is a font file of type 'Extended' created with the font converter from version 3.04 or newer.
Example
The Sample folder contains the example FONT_ThaiText.c , which shows how to draw Thai text. It contains an emWin font with Thai characters and some small Thai text examples.
Font files used with Thai text
Font files used to render Thai text need to include at least all characters defined in the 'Thai' range 0xE00-0xE7F.
Shift JIS support
Shift JIS (Japanese Industry Standard) is a character encoding method for the Japanese language. It is the most common Japanese encoding method. Shift JIS encoding makes generous use of 8-bit characters, and the value of the first byte is used to distinguish single- and multiple-byte characters. The Shift JIS support of emWin is only needed if text with Shift JIS encoding needs to be rendered. You need no special function calls to draw a Shift JIS string. The main requirement is a font file which contains the Shift JIS characters.
Creating Shift JIS fonts
The Font Converter can generate a Shift JIS font for emWin from any Windows font. When using a Shift JIS font, the functions used to display Shift JIS characters are linked automatically with the library. For detailed information on how to create Shift-JIS fonts, contact SEGGER Microcontroller GmbH & Co. KG (info@segger.com). A separate Font Converter documentation describes all you need for an efficient way of implementing Shift JIS in your emWin projects.
Foreign Language Support
