Glk uses the Latin-1 Unicode encoding, and keeps it holy.
Latin-1 is an 8-bit character encoding; it maps numeric codes in the range 0 to 255 into printed characters. The values from 32 to 126 are the standard printable ASCII characters (' ' to '~'). Values 0 to 31 and 127 to 159 are reserved for control characters, and have no printed equivalent.
Glk uses different parts of the Latin-1 encoding for different purposes.
When you are sending text to a window, or to a file open in text mode, you can print any of the printable Latin-1 characters: 32 to 126, 160 to 255. You can also print the newline character (linefeed, control-J, decimal 10, hex 0x0A.)
It is not legal to print any other control characters (0 to 9, 11 to 31, 127 to 159). You may not print even common formatting characters such as tab (control-I), carriage return (control-M), or page break (control-L). [As usual, the behavior of the library when you print an illegal character is undefined. It is preferable that the library display a numeric code, such as "\177" or "0x7F", to warn the user that something illegal has occurred. The library may skip illegal characters entirely; but you should not rely on this.]
Note that when you are sending data to a file open in binary mode, you can print any byte value, without restriction. See section 5.6.3, "File Streams".
A particular implementation of Glk may not be able to display all the printable characters. It is guaranteed to be able to display the ASCII characters (32 to 126, and the newline 10.) Other characters may be printed correctly, printed as multi-character combinations (such as "ae" for the one-character "ae" ligature (æ)), or printed as some placeholder character (such as a bullet or question mark, or even an octal code.)
You can test for this by using the gestalt_CharOutput selector. If you set ch to a character code (from 0 to 255), and call
glui32 res, len; res = glk_gestalt_ext(gestalt_CharOutput, ch, &len, 1);
then res will be one of the following values:
In all cases, len (the glui32 value pointed at by the third argument) will be the number of actual glyphs which will be used to represent the character. In the case of gestalt_CharOutput_ExactPrint, this will always be 1; for gestalt_CharOutput_CannotPrint, it may be 0 (nothing printed) or higher; for gestalt_CharOutput_ApproxPrint, it may be 1 or higher. This information may be useful when printing text in a fixed-width font.
[As described in section 1.9, "Other API Conventions", you may skip this information by passing NULL as the third argument in glk_gestalt_ext(), or by calling glk_gestalt() instead.]
If ch is outside the range 0 to 255, this selector will always return gestalt_CharOutput_CannotPrint. It is also guaranteed to do this if ch is an unprintable character (0 to 9, 11 to 31, 127 to 159.)
[Make sure you do not get confused by signed byte values. If you set a "char" variable ch to 0xFE, the small-thorn character (þ), and then call
res = glk_gestalt(gestalt_CharOutput, ch);then (by the definition of C/C++) ch will be sign-extended to 0xFFFFFFFE, which is not in the range 0 to 255. You should write
res = glk_gestalt(gestalt_CharOutput, (unsigned char)ch);instead.]
You can request that the player enter a line of text. See section 4.2, "Line Input Events".
This text will be placed in a buffer of your choice. There is no length field or null terminator in the buffer. (The length of the text is returned as part of the line-input event.)
The buffer will contain only printable Latin-1 characters (32 to 126, 160 to 255).
A particular implementation of Glk may not be able to accept all printable characters as input. It is guaranteed to be able to accept the ASCII characters (32 to 126.)
You can test for this by using the gestalt_LineInput selector. If you set ch to a character code (from 0 to 255), and call
glui32 res; res = glk_gestalt(gestalt_LineInput, ch);
then res will be TRUE (1) if that character can be typed by the player in line input, and FALSE (0) if not. Note that if ch is a nonprintable character (0 to 31, 127 to 159), or if ch is outside the range 0 to 255, then this is guaranteed to return FALSE.
You can request that the player hit a single key. See section 4.1, "Character Input Events".
The character code which is returned can be any value from 0 to 255. The printable character codes have already been described. The remaining codes are typically control codes, control-A to control-Z and a few others.
There are also a number of special codes, representing special keyboard keys, which can be returned from a char-input event. These are represented as 32-bit integers, starting with 4294967295 (0xFFFFFFFF) and working down. The special key codes are defined in the glk.h file. They include:
Various implementations of Glk will vary widely in which characters the player can enter. The most obvious limitation is that some characters are mapped to others. For example, most keyboards return a control-I code when the tab key is pressed. The Glk library, if it can recognize this at all, will generate a keycode_Tab event (value 0xFFFFFFF7) when this occurs. Therefore, for these keyboards, no keyboard key will generate a control-I event (value 9.) The Glk library will probably map many of the control codes to the other special keycodes.
[On the other hand, the library may be very clever and discriminate between tab and control-I. This is legal. The idea is, however, that if your program asks the player to "press the tab key", you should check for a keycode_Tab event as opposed to a control-I event.]
Some characters may not be enterable simply because they do not exist. [Not all keyboards have a home or end key. A pen-based platform may not recognize any control characters at all.]
Some characters may not be enterable because they are reserved for the purposes of the interface. For example, the Mac Glk library reserves the tab key for switching between different Glk windows. Therefore, on the Mac, the library will never generate a keycode_Tab event or a control-I event.
[Note that the linefeed or control-J character, which is the only printable control character, is probably not typable. This is because, in most libraries, it will be converted to keycode_Return. Again, you should check for keycode_Return if your program asks the player to "press the return key".]
[The delete and backspace keys are merged into a single keycode because they have such an astonishing history of being confused in the first place... this spec formally waives any desire to define the difference. Of course, a library is free to distinguish delete and backspace during line input. This is when it matters most; conflating the two during character input should not be a large problem.]
You can test for this by using the gestalt_CharInput selector. If you set ch to a character code (from 0 to 255) or a special code (from 0xFFFFFFFF down), and call
glui32 res; res = glk_gestalt(gestalt_CharInput, ch);
then res will be TRUE (1) if that character can be typed by the player in character input, and FALSE (0) if not.
[Glk porters take note: it is not a goal to be able to generate every single possible key event. If the library says that it can generate a particular keycode, then game programmers will assume that it is available, and ask players to use it. If a keycode_Home event can only be generated by typing escape-control-A, and the player does not know this, the player will be lost when the game says "Press the home key to see the next hint." It is better for the library to say that it cannot generate a keycode_Home event; that way the game can detect the situation and ask the user to type H instead.]
[Of course, it is better not to rely on obscure keys in any case. The arrow keys and return are nearly certain to be available; the others are of gradually decreasing reliability, and you (the game programmer) should not depend on them. You must be certain to check for the ones you want to use, including the arrow keys and return, and be prepared to use different keys in your interface if gestalt_CharInput says they are not available.]
You can convert characters from upper to lower case with two Glk utility functions:
unsigned char glk_char_to_lower(unsigned char ch); unsigned char glk_char_to_upper(unsigned char ch);
These have a few advantages over the standard ANSI tolower() and toupper() macros. They work for the entire Latin-1 character set, including accented letters; they behave consistently on all platforms, since they're part of the Glk library; and they are safe for all characters. That is, if you call glk_char_to_lower() on a lower-case character, or a character which is not a letter, you'll get the argument back unchanged.
The case-sensitive characters in Latin-1 are the ranges 0x41..0x5A, 0xC0..0xD6, 0xD8..0xDE (upper case) and the ranges 0x61..0x7A, 0xE0..0xF6, 0xF8..0xFE (lower case). These are arranged in parallel; so glk_char_to_lower() will add 0x20 to values in the upper-case ranges, and glk_char_to_upper() will subtract 0x20 from values in the lower-case ranges.