Xterm now has UTF-8 support

Xterm now has UTF-8 support

Post by Markus Ku » Thu, 10 Jun 1999 04:00:00




Unicode/ISO 10646-1 (Level 1) support for Linux and Unix under X11 is
one important step further. The latest development revision of the xterm
version distributed by the XFree86 project can now handle 16-bit
ISO10646-1 fonts and can do screen output, keyboard input, as well as
cut&paste all in UTF-8.

Here is how you can try it out very quickly yourself:

Get the xterm source code from

  http://www.clark.net/pub/dickey/xterm/xterm.tar.gz

(that is patch version #106 or higher), untar it, and compile it with

  ./configure --enable-wide-chars ; make

Also get from

  http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz

a set of ISO10646-1 versions of the default xterm fonts. The recommended
completed font in there is 6x13.pcf.gz, but the larger 9x15.pcf.gz and
10x20.pcf.gz fonts are also already in a quite advanced stage of
development (>2000 characters) and can also be used. Install at least
one of these ISO10646-1 fonts as described in the README file.

Now start xterm with option -u8 and select an ISO10646-1 font, for
instance as in

 xterm -u8 -fn -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1

To see an example UTF-8 output, just display the demo files that
came with the fonts, e.g.

  cat utf-8-demo.txt

If you have any non-ASCII characters on your keyboard, you can create
UTF-8 files by simply typing them in. All keysym codes of X11 are
mapped onto the corresponding UTF-8 sequence by xterm.

If say you want to have the euro sign on AltGr-E, then just add the line

  keysym e = e NoSymbol EuroSign   NoSymbol

to your ~/.Xmodmap file (assuming you have "xmodmap .Xmodmap" in one of
your login scripts). Greek and Cyrillic keyboards should also work
immediately.

In case you are unfamiliar with UTF-8: The ASCII compatible UTF-8
encoding of Unicode is defined in

  ftp://ftp.informatik.uni-erlangen.de/pub/doc/ISO/charsets/ISO-10646-U...
  ftp://ftp.funet.fi/mirrors/nic.nordu.net/rfc/rfc2279.txt

It is the way in which Unicode will be used on Unix systems and will
hopefully replace ASCII and ISO 8859 soon.

More info on using UTF-8 under Unix will shortly be on

  http://www.cl.cam.ac.uk/~mgk25/unicode.html

where I will also collect information on how to make applications UTF-8
aware.

Markus

--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

 
 
 

1. Solaris 8/xterm/UTF-8

Hi, I'm beginning some Unicode/UTF-8 work on a Solaris box and I don't
think I have what's needed installed. This post lists what I have using
locale for comment.

LC_ALL = en_US.UTF-8  // the flagship local per Sun

locale -m gives iso_8859_1/charmap.src

locale charmap gives UTF-8

locale -a gives
   POSIX
   common
   en_US.UTF-8
   C
   iso_8859_1

To test, I created a small app that has a french character (c with the
cedille and the greek gamma in a wchar_t buffer.

I've tried without the byte order mark (EF BB BF) and with it. The
french character displays; it is a member of Latin-1 and would be
expected. The Greek gamma doesn't.

In reading Sun documentation, I see that en_US.UTF-8 is suppose to
incorporate many character sets. However, I don't see these listed when
I do a locale -a.

Is this my issue? The administrators didn't install the rest of the
character sets which en_US.UTF-8 maps to? I should see the charset
iso_8859-7 (Greek) as well?

The only other thing that I haven't tried -- just read off the internet
before this post -- was to start xterm with -u8. I'm using Hummingbird
eXceed on XP.

From all that I've read, if the file is UTF-8, I should be able to 'cat'
it out.

Thanks in advance,
dave parker

2. strange files

3. UTF-8 support on 2.5.1 and 2.6

4. AAACK Xtoolplaces?!?!

5. Configuring AIX & CICS/6000 for UTF-8 (For Arabic Support)

6. what is required for kde apps to run?

7. EUC(multibyte) and UTF-8 locale support in glibc??

8. Help Wanted: SERVER GURU

9. UTF-8 support

10. [2.5] UTF-8 support in console

11. a2ps & UTF-8

12. How to enter high-bit characters in UTF-8 with tcsh?

13. Problems converting from UTF-8 to default codepage and vice versa using mbstowcs and wcstombs