This chapter describes how to work with text in the user interface—whether it's text the user has entered or text that your application has created to display on the screen. When you work with text, you must take special care to do so in a way that makes your application easily localizable. This chapter describes how to write code that manipulates characters and strings in such a way that it works properly for any language that is supported by Palm OS®. It covers:
Character Encodings
Characters
Strings
Character Encodings
Computers represent the characters in an alphabet with a numeric code. The set of numeric codes for a given alphabet is called a character encoding. Of course, a character encoding contains more than codes for the letters of an alphabet. It also encodes punctuation, numbers, control characters, and any other characters deemed necessary. The set of characters that a character encoding represents is called a character set.
Different languages use different alphabets. Most European languages use the Latin alphabet. The Latin alphabet is relatively small, so its characters can be represented using a single-byte encoding ranging from 32 to 255. On the other hand, Asian languages such as Chinese, Korean, and Japanese require their own alphabets, which are much larger. These larger character sets are represented by a combination of single-byte and double-byte numeric codes ranging from 32 to 65,535.
Although Palm OS supports multiple character encodings, only one of these encodings is active at a time. For example, a French device uses the Palm OS Latin encoding, which is identical to the Microsoft Windows code page 1252 character encoding (an extension of ISO Latin 1) but includes Palm-specific characters in the control range. A Japanese device, on the other hand would use the Palm OS Shift JIS character encoding, which is identical to Microsoft Windows code page 932 (an extension of Shift JIS) but includes Palm-specific characters in the control range. These two devices use different character encodings even though they both use the same version of Palm OS.
No matter what the encoding is on a device, PalmSource guarantees that the low ASCII characters (0 to 0x7F) are the same. The exception to this rule is 0x5C, which is a yen symbol on Japanese devices and a backslash on most others.
The Palm OS Text Manager allows you to work with text, strings, and characters independent of the character encoding. If you use Text Manager functions and don't work directly with string data, your code should work on any system, regardless of which language and character encoding the device supports.
Characters
Depending on the device's supported languages, Palm OS may encode characters using either a single-byte encoding or a multi-byte encoding. Because you do not know which character encoding is used until runtime, you should never make an assumption about the number of bytes a character occupies in a string.
For the most part, your application does not need to know which character encoding is used, and in fact, it should make no assumptions about the encoding or about the size of characters. Instead, your code should use Text Manager functions to manipulate characters. This section describes how to work with characters correctly. It covers:
Declaring Character Variables
Using Character Constants
Missing and Invalid Characters
Retrieving a Character's Attributes
Virtual Characters
Retrieving the Character Encoding
Declaring Character Variables
Declare all character variables to be of type wchar32_t
. wchar32_t
is a 32-bit unsigned type that can accommodate characters of any encoding. Don't use char
. char
is an 8-bit variable that cannot accommodate larger character encodings.
wchar32_t ch; // Right. 32-bit character. char ch; // Wrong. 8-bit character.
When you receive input characters through the keyDownEvent
, you'll receive a wchar32_t
value. (That is, the data.keyDown.chr
field is a wchar32_t
.)
While character variables are declared as wchar32_t
, string variables are still declared as char *
, even though they may contain multi-byte characters. See the section "Strings" for more information on strings.
Using Character Constants
Character constants are defined in several header files. The header file Chars.h
contains characters that are guaranteed to be supported on all systems regardless of the encoding. Other header files exist for each supported character encoding and contain characters specific to that encoding. The character encoding-specific header files are not included in the PalmOS.h
header by default because they define characters that are not available on every system.
To make it easier for the compiler to find character encoding problems with your project, make a practice of using the character constants defined in these header files rather than directly assigning a character variable to a value. For example, suppose your code contained this statement:
wchar32_t ch = 'å'; // WRONG! Don't use.
This statement may work on a Latin system, but it would cause problems on an Asian-language system because the å character does not exist. If you instead assign the value this way:
wchar32_t ch = chrSmall_A_RingAbove;
you'll find the problem at compile time because the chrSmall_A_RingAbove
constant is defined in CharLatin.h
, which is not included by default.
Missing and Invalid Characters
If during application testing, you see an open rectangle displayed on the screen, you have a missing character.
A missing character is one that is valid within the character encoding but the current font is not able to display it. In this case, nothing is wrong with your code other than you have chosen the wrong font. The system displays an open rectangle in place of a missing single-byte character (see Figure 1.1).
In multi-byte character encodings, a character may be missing as described above, or it may be invalid. In single-byte character encodings, there's a one-to-one correspondence between numeric values and characters to represent. This is not the case with multi-byte character encodings. In multi-byte character encodings, there are more possible values than there are characters to represent. Thus, a character variable could end up containing an invalid character—a value that doesn't actually represent a character.
If the system is asked to display an invalid character, it prints an open rectangle for the first invalid byte. Then it starts over at the next byte. Thus, the next character displayed and possibly even the remaining text displayed is probably not what you want. Check your code for the following:
- Truncating strings. You might have truncated a string in the middle of a multi-byte character.
- Appending characters from one encoding set to a string in a different encoding.
- Arithmetic on character variables that could result in an invalid character value.
- Arithmetic on a string pointer that could result in pointing to an intra-character boundary. See "Performing String Pointer Manipulation" for more information.
- Use of standard C string functions. Many of these functions are not multi-byte aware and can return invalid results for strings that contain multi-byte characters.
- Assumptions that a character always occupies only one byte in a string.
Use the Text Manager function TxtCharIsValid()
to determine whether a character is valid or not.
Retrieving a Character's Attributes
The Text Manager defines certain functions that retrieve a character's attributes, such as whether the character is alphanumeric, and so on. You can use these functions on any character, regardless of its size and encoding.
A character also has attributes unique to its encoding. Functions to retrieve those attributes are defined in the header files specific to the encoding.
Virtual Characters
Virtual characters are nondisplayable characters that trigger special events in the operating system, such as displaying low battery warnings or displaying the keyboard dialog. Virtual characters should never occur in any data and should never appear on the screen.
The Palm OS uses character codes 256 decimal and greater for virtual characters. The range for these characters may actually overlap the range for "real" characters (characters that should appear on the screen). The keyDownEvent
distinguishes a virtual character from a displayable character by setting the commandKeyMask bit in the structure's modifiers
field.
The best way to check for virtual characters, including virtual characters that represent the hard keys, is to use the TxtCharIsVirtual()
function. See Listing 1.1.
Listing 1.1 Checking for virtual characters
if (TxtCharIsVirtual (eventP->data.keyDown.modifiers, eventP->data.keyDown.chr)) { if (TxtCharIsHardKey (event->data.keyDown.modifiers, event->data.keyDown.chr)) { // Handle hard key virtual character. } else { // Handle standard virtual character. } } else { // Handle regular character. }
Retrieving the Character Encoding
Occasionally, you may need to determine which character encoding is being used. For example, your application might use specifically optimized code when it's being run on a device that uses the Palm OS Latin character encoding. You can retrieve the character encoding using the LmGetSystemLocale()
function as shown in Listing 1.2.
Listing 1.2 Retrieving the character encoding
CharEncodingType encoding; char* encodingName; encoding = LmGetSystemLocale(NULL); if (encoding == charEncodingPalmSJIS) { // encoding for Palm Shift JIS } else if (encoding == charEncodingPalmLatin) { // extension of ISO Latin 1 } else { // Note: Palm OS licensees may add support for other // character encodings. } // The following Text Manager function returns the // official name of the encoding as required by // Internet applications. encodingName = TxtEncodingName(encoding);
Strings
Strings are made up of characters that occupy from one to four bytes each. As stated previously, the standard character variable, wchar32_t
, is four bytes long. However, when you add a character to a string, the operating system may shrink it down to a single byte if it's a low ASCII character. Thus, any string that you work with may contain a mix of single-byte and multi-byte characters.
When working with text as strings, you can use any of the following:
- Standard C Library string functions
Palm OS Cobalt supports the standard C library including the standard C string functions. These functions only manipulate strings containing single-byte characters. Do not use these functions on strings that may contain multi-byte characters.
For example, if your application displays a numeric text field in which the user may enter some sort of application setting, it's acceptable to manipulate the string that you receive from the numeric text field using the standard C library calls. If a string may contain letters, you should use String Manager or Text Manager calls to make your application easily localizable.
- The String Manager
The String Manager is closely modeled after the standard C library functions like
strcpy()
,strcat()
, and so on. In some cases, the String Manager functions call through to their standard C library counterparts. In other cases, the String Manager function has been modified to become multi-byte aware. - The Text Manager
The Text Manager specifically provides support for multi-byte strings. Use the Text Manager functions when:
- A String Manager equivalent is not available.
- The length of the matching strings are important. For example, to compare two strings, you can use either
StrCompare()
orTxtCompare()
. The difference between the two is thatStrCompare()
does not return the length of the characters that matched.TxtCompare()
does.
This section discusses the following topics:
Manipulating Strings
Performing String Pointer Manipulation
Truncating Displayed Text
Comparing Strings
Dynamically Creating String Content
TIP: All Palm OS functions that return the length of a string, such as
FldGetTextLength()
and StrLen()
, always return the size of the string in bytes, not the number of characters in the string. Similarly, functions that work with string offsets always use the offset in bytes, not characters.
Manipulating Strings
Any time that you want to work with character pointers, you need to be careful not to point to an intra-character boundary (a middle or end byte of a multi-byte character). For example, any time that you want to set the insertion point position in a text field or set the text field's selection, you must make sure that you use byte offsets that point to inter-character boundaries. (The inter-character boundary is both the start of one character and the end of the previous character, except when the offset points to the very beginning or very end of a string.)
Suppose you want to iterate through a string character by character. Traditionally, C code uses a character pointer or byte counter to iterate through a string a character at a time. Such code will not work properly on systems with multi-byte characters. Instead, if you want to iterate through a string a character at a time, use Text Manager functions:
-
TxtGetNextChar()
retrieves the next character in a string. -
TxtGetPreviousChar()
retrieves the previous character in a string. -
TxtSetNextChar()
changes the next character in a string and can be used to fill a string buffer.
Each of these three functions returns the size of the character in question, so you can use it to determine the offset to use for the next character. For example, Listing 1.3 shows how to iterate through a string character by character until a particular character is found.
Listing 1.3 Iterating through a string or text
char* buffer; // assume this exists size_t bufLen = StrLen(buffer); // Length of the input text. wchar32_t ch = 0; size_t i = 0; while ((i < bufLen) && (ch != chrAsterisk)) i+= TxtGetNextChar(buffer, i, &ch));
The Text Manager also contains functions that let you determine the size of a character in bytes without iterating through the string:
-
TxtCharSize()
returns how much space a given character will take up inside of a string. -
TxtCharBounds()
determines the boundaries of a given character within a given string.
Listing 1.4 Working with arbitrary limits
size_t charStart, charEnd; char *fldTextP = FldGetTextPtr(fld); TxtCharBounds(fldTextP, min(kMaxBytesToProcess, FldGetTextLength(fld)), &charStart, &charEnd); // process only the first charStart bytes of text.
Performing String Pointer Manipulation
Never perform any pointer manipulation on strings you pass to the Text Manager unless you use Text Manager calls to do the manipulation. For Text Manager functions to work properly, the string pointer must point to the first byte of a character. If you use Text Manager functions when manipulating a string pointer, you can be certain that your pointer always points to the beginning of a character. Otherwise, you run the risk of pointing to an inter-character boundary.
Listing 1.5 String pointer manipulation
// WRONG! buffer + kMaxStrLength is not // guaranteed to point to start of character. buffer[kMaxStrLength] = '\0'; // Right. Truncate at a character boundary. size_t offset = TxtGetTruncationOffset(buffer, kMaxStrLength); buffer[offset] = chrNull;
Truncating Displayed Text
If you're performing drawing operations, you often have to determine where to truncate a string if it's too long to fit in the available space. Several functions help you perform this task on strings with multi-byte characters:
-
WinDrawTruncChars()
— This function draws a string within a specified width, determining automatically where to truncate the string. If it can, it draws the entire string. If the string doesn't fit in the space, it draws one less than the number of characters that fit and then ends the string with an ellipsis (...).Note, however, that the Window Manager drawing functions are deprecated for many uses and should not be mixed with the Palm OS Cobalt graphics context functions.
-
FntTruncateString()
— This function performs the same task asWinDrawTruncChars()
except that it does not draw the text to the screen. You might use this if you are using a bitmapped font to display the text drawn using the Palm OS Cobalt graphics context functions. See Listing 1.6.
Listing 1.6 Drawing multiple lines of text in a bitmapped font
fcoord_t y; char *msg, *dstMsg; size_t pixelWidth = 160; GcHandle gc = GcGetCurrentContext(); Boolean truncated = false; FntSetFont(stdFont); GcSetFont(gc, GcCreateFontFromID(stdFont)); dstMsg = (char *)malloc(StrLen(msg)+1); truncated = FntTruncateString(dstMsg, msg, FntGetFont(), pixelWidth, true); GcDrawTextAt(gc, 0.0, y, dstMsg, StrLen(dstMsg)); GcReleaseContext(gc);
-
GcFontStringBytesInWidth()
— This function works with scalable fonts. It returns the size in bytes of the substring that can be displayed in a specified width.Listing 1.7 shows how to use
GcFontStringBytesInWidth()
to determine how many lines are necessary to write a string to the screen (without considering word wrapping). This example passes the width of the screen as the pixel position so that upon return,widthToOffset
contains the byte offset of the last character in the string that can be displayed on a single line. The characters up to and including the one atwidthToOffset
are drawn, then themsg
pointer is advanced in the string bywidthToOffset
characters, andGcFontStringBytesInWidth()
is again called to find out how many characters fit on the next line of text. The process is repeated until all of the characters in the string have been drawn.
Listing 1.7 Drawing multiple lines of text in scalable font
fcoord_t y; char *msg; size_t widthToOffset = 0; size_t pixelWidth; size_t msgLength = StrLen(msg); GcHandle gc = GcGetCurrentContext(); GcFontHandle standardFont = GcCreateFont("palmos-plain"); FontHeightType fontHeight; RectangleType winBounds; // Set the pixel offset to the width of the screen. // The scalable font functions expect native coordinates. WinSetCoordinateSystem(kCoordinatesNative); GcSetCoordinateSystem(gc, kCoordinatesNative); WinGetWindowBounds(&winBounds); pixelWidth = winBounds.extent.x; GcGetFontHeight(standardFont, &fontHeight); GcSetFont(gc, standardFont); // Begin drawing the string to the screen. while (msg && *msg) { widthToOffset = GcFontStringBytesInWidth(standardFont, msg, pixelWidth); GcDrawTextAt(gc, 0.0, y, msg, widthToOffset); y = y + fontHeight.ascent + fontHeight.descent + fontHeight.leading; msg += widthToOffset; msgLength = StrLen(msg); } GcReleaseContext(gc);
Comparing Strings
Use the Text Manager functions TxtCompare()
and TxtCaselessCompare()
to perform comparisons of localizable strings.
In character encodings that use multi-byte characters, some characters have both single-byte and double-byte representations. One string might use the single-byte representation and another might use the multi-byte representation. Users expect the characters to match regardless of how many bytes a string uses to store that character. TxtCompare()
and TxtCaselessCompare()
can accurately match single-byte characters with their multi-byte equivalents.
Because a single-byte character might be matched with a multi-byte character, two strings might be considered equal even though they have different lengths. For this reason, TxtCompare()
and TxtCaselessCompare()
take two parameters in which they pass back the length of matching text in each of the two strings. See their function descriptions for more information.
The String Manager functions StrCompare()
and StrCaselessCompare()
are equivalent to TxtCompare()
and TxtCaselessCompare()
, but they do not pass back the length of the matching text.
These Text Manager and String Manager comparison routines use text tables for comparisons, and they are potentially slow. If you want to compare strings that you know contain only 7-bit ASCII characters (for example, the strings are completely internal to the program and never appear in the user interface), use the standard C library functions such as strcmp()
instead.
A special case of performing string comparison is implementing the Global Find facility. For more information on implementing this feature in your application, see Chapter 2, "Implementing Global Find."
Dynamically Creating String Content
When working with strings in a localized application, you never hard code them. Instead, you store strings in a resource and use the resource to display the text. If you need to create the contents of the string at runtime, store a template for the string as a resource and then substitute values as needed.
For example, consider the Edit view of the Memo application. Its title bar contains a string such as "Memo 3 of 10." The number of the memo being displayed and the total number of memos cannot be determined until runtime.
To create such a string, use a template resource and the Text Manager function TxtParamString()
. TxtParamString()
allows you to search for the sequence ^0, ^1, up to ^3 and replace each of these with a different string. If you need more parameters, you can use TxtReplaceStr()
, which allows you to replace up to ^9; however, TxtReplaceStr()
only allows you to replace one of these sequences at a time.
In the Memo title bar example, you'd create a string resource that looks like this:
Memo ^0 of ^1
And your code might look like this:
Listing 1.8 Using string templates
static void EditViewSetTitle (void) { char* titleTemplateP; FormPtr frm; char posStr[maxStrIToALen+1]; char totalStr[maxStrIToALen+1]; uint16_t pos; uint16_t length; // Format as strings, the memo's postion within // its category, and the total number of memos // in the category. pos = DmGetPositionInCategory(MemoPadDB, CurrentRecord, RecordCategory); StrIToA (posStr, pos+1); if (MemosInCategory == memosInCategoryUnknown) MemosInCategory = DmNumRecordsInCategory (MemoPadDB, RecordCategory); StrIToA (totalStr, MemosInCategory); // Get the title template string. It contains ^0 and ^1 // chars which we replace with the position of // CurrentRecord within CurrentCategory and with the total // count of records in CurrentCategory (). titleTemplateP = MemHandleLock (gAppDbP, DmGetResource (gAppDbP, strRsc, EditViewTitleTemplateStringString)); EditViewTitlePtr = TxtParamString(titleTemplateP, posStr, totalStr, NULL, NULL); // Now set the title to use the new title string. frm = FrmGetFormPtr(MemoPadEditForm); FrmSetTitle (frm, EditViewTitlePtr); MemPtrUnlock(titleTemplateP); }