ISO 8859-1 National Character Set FAQ

ISO 8859-1 National Character Set FAQ

Post by m.. » Mon, 26 Sep 1994 09:14:37



Archive-name: character-sets/iso-8859-1-faq
Posting-Frequency: monthly

                  ISO 8859-1  National Character Set FAQ

DISCLAIMER: THE AUTHOR MAKES NO WARRANTY OF ANY KIND WITH REGARD TO
THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

Note: Most of this was tested on a Sun 10, running SunOS 4.1.* - other
systems might differ slightly

This FAQ discusses topics related to the use of ISO 8859-1 based 8 bit
character sets. It discusses how to use European (Latin American)
national character sets on UNIX-based systems and the internet.

1. Which coding should I use for accented characters?
Use the internationally standardized ISO-8859-1 character set to type
accented characters. This character set contains all characters
necessary to type (West) European languages. This encoding is also the
preferred encoding on the Internet (where accepted - see below).

This character set is also used by MS-Windows (Actually, MS-Windows
uses UNICODE (ISO 10646) truncated to 8 bit, which gives an equivalent
encoding.), VMS and (practically all) UNIX implementations. MS-DOS
uses a different character set and is not compatible with this
character set. (It can, however, be translated to this format with
various tools. See section 7.)

ISO 8859-1 supports the following languages:
Afrikaans, Catalan, Danish, Dutch, English, Faeroese, Finnish, French,
German, Galician, Irish, Icelandic, Italian, Norwegian, Portuguese,
Spanish and Swedish.

(It has been called to my attention that Albanian can be written with
ISO 8859-1 also.  However, from a standards point of view, ISO 8859-2
is the appropriate character set for Balkan countries.)

ISO 8859-1 is just one part of the ISO-8859 standard, which specifies
several character sets, e.g.:
8859-1  Europe, Latin America
8859-2  Eastern Europe
8859-3  SE Europe
8859-4  Scandinavia (mostly covered by 8859-1 also)
8859-5  Cyrillic
8859-6  Arabic
8859-7  Greek
8859-8  Hebrew

2. Getting your terminal to handle ISO characters.
Terminal drivers normally do not pass 8 bit characters. To enable
proper handling of ISO characters, add the following lines to your
.cshrc:
----------------------------------
tty -s
if ($status == 0) stty cs8 -istrip -parenb
----------------------------------
If you don't use csh, add equivalent code to your shell's start up
file.

Note that it is necessary to check whether your standard I/O streams
are connected to a terminal. Only then should you reconfigure the
terminal driver.

3. Selecting the right font under X-11 for xterm (and other applications)
To actually display accented characters, you need to select a font
which does contains bit maps for ISO 8859-1 characters in the
correct character positions. The names of these fonts normally
have the suffix "iso8859-1". Use the command
# xlsfonts
to list the fonts available on your system. You can preview a
particular font with the
# xfd -fn <fontname>
command.

Add the appropriate font selection to your ~/.Xdefaults file, e.g.:
----------------------------------------------------------------------------
XTerm*Font: -adobe-courier-medium-r-normal--18-180-75-75-m-110-iso8859-1
Mosaic*XmLabel*fontList: -*-helvetica-bold-r-normal-*-14-*-*-*-*-*-iso8859-1
----------------------------------------------------------------------------

Footnote: The X11R5 distribution has some fonts which are labeled as
ISO fonts, but which do not contain the ISO characters.

4. Getting the locale setting right.
For the ctype macros (and by extension, applications you are running
on your system) to correctly identify accented characters, you
may have to set the ctype locale to an ISO 8859-1 conformant
configuration. On SunOS this may be done by placing
------------------------------------
setenv LANG C
setenv LC_CTYPE iso_8859_1
------------------------------------
in your .login script (if you use the csh). An equivalent statement
will adjust the ctype locale for non-csh users.

The process is the same for other operating systems, e.g. on HP/UX use
'setenv LANG german'; on IRIX 5.2 use 'setenv LANG de'; on Ultrix 4.3
use 'setenv LANG GER_DE.8859' and on OSF/1 use 'setenv LANG
de_DE.88591'.  The examples given here are for German.  Other
languages work too, depending on your operating system.  Check out
'man setlocale' on your system for more information.

5. Printing accented characters.

5.1 PostScript printers
If you want to print accented characters on a postscript printer, you
may need a PS filter which can handle ISO characters.

Our Postscript filter of choice is a2ps, the more recent version of
which can handle ISO 8859-1 characters with the -8 option.
a2ps V4.3 is available via anonymous ftp from imag.imag.fr under the
file name /archive/postscript/a2ps.V4.3.tar.Z.

5.2 Other (non-PS) printers:
If you want to print to non-PS printers, your success rate depends on
the encoding the printer uses. Several alternatives are possible:

* Your printer accepts ISO 8859-1:
  You're lucky. No conversion is needed, just send your files to the
  printer.

* You printer supports a PC-compatible font:
  You can use the recode tool to translate from ISO 8859-1 to this
  encoding. (If you are using a SunOS based computer, you can also use
  the unix2dos utility which is part of the standard distribution.)
  Just add the appropriate invocation as a built-in filter to your
  printer driver.  

* Your printer uses a national ISO 646 variant (7 bit ASCII
  with some special characters replaced by national characters):
  You will have to use a translation tool; this tool would
  then be installed in the printer driver and translate character
  conventions before sending a file to the printer.  The recode
  program supports many national ISO 646 norms.  (If you add do
  this, please submit it to the maintainers of recode, so that it can
  benefit everybody.)

  Unfortunately, you will not be able to display all acharcters with
  the built-in characters set. Most printers have user-defineable
  bit-map characters, which you can use to print all ISO characters.
  You just have to generate a pix-map for any particular character and
  send this bitmap to the printer.  The syntax for these characters
  varies, but a few conventions have gained universal acceptance
  (e.g., many printers can process Epson-compatible escape sequences).

* Your printer supports a strange format:
  If your printer supports some other strange format (e.g. HP Roman8,
  DEC MCS, Atari, NeXTStep EBCDIC or what have you), you have to add a
  filter which will translate ISO *859-1 to this encoding before
  sending your data to the printer.  'recode' supports many of these
  character sets already.  If you have to write your own conversion
  tool, consider this as a good starting base. (If you add support for
  any new character sets, please submit your code changes to the
  maintainers of recode).

  If your printer supports DEC MCS, this is nearly equivalent to ISO
  8859-1 (actually, it is a former ISO 8859-1 draft standard) - the
  difference is only a few characters.  You could probably get by
  with just sending ISO 8859-1 to the printer.

* Your printer supports ASCII only:
  You have several options:
  + If your printer supports user-defined character, you can print all
    ISO characters not supported by ASCII by sending the appropriate
    bitmaps.
  + Add a filter to the printer driver which will strip the accent
    characters and just print the unaccented characters.
  + Add a filter which will generate escape sequences (such as
    " <BACKSPACE> a for Umlaut-a (?), etc.) to be printed.  Recode
    supports this encoding under the name ascii-bs.

Footnote: For more information on character translation and the
'recode' tool, see section 7.

6. TeX and ISO 8859-1
If you want to write TeX without having to type {\"a}-style escape
sequences, you can either get a TeX versions configured to read 8-bit
ISO characters, or you can translate between ISO and TeX codings.

The latter is arduous if done by hand, but can be automated if you use
emacs. If you use Emacs 19.23 or higher, simply add the following line
to your .emacs startup file. This mode will perform the necessary
translations for you automatically:
------------------
(require 'iso-cvt)
------------------

If you are using pre-19.23 versions of emacs, get the "gm-lingo.el"
lisp file via anonymous ftp from ftp.vlsivie.tuwien.ac.at in /pub/8bit.
Load gm-lingo from your .emacs startup file and this mode will perform
the necessary translations for you automatically.

If you want to configure TeX to read 8 bit characters, check out the
configuration files available via anonymous ftp from
ftp.vlsivie.tuwien.ac.at in /pub/8bit.  The new LaTeX2e reportedly
supports 8 bit characters by default.

7. Translating between different international character sets.
While ISO 8859-1 is an international standard, not everybody uses this
encoding. Many computers use their own, vendor-specific character sets
(most notably Microsoft for MS-DOS). If you want to edit or view files
written in different encoding, you will have to translate them to an
ISO 8859-1 based representation.

There are several PD character set translators available on the
internet, the most notable being 'recode'. recode is available via
anonymous ftp from prep.ai.mit.edu and resides in the directory
/u2/emacs. recode is covered by FSF copyright and is freely
redistributable.  Under SunOS, the dos2unix and unix2dos programs
(distributed with SunOS) will translate between MS-DOS and ISO 8859-1
formats.

8. ISO 8859-1 and emacs
Emacs 19 (as opposed to Emacs 18) can automatically handle 8 bit
characters. (If you have a choice, upgrade to Emacs version 19.23,
which has the most complete ISO support.) Emacs 19 has extensive
support for ISO 8859-1. If your display supports ISO 8859-1 encoded
characters, add the following line to your .emacs startup file: ...

read more »

 
 
 

ISO 8859-1 National Character Set FAQ

Post by Tom McFarla » Sun, 02 Oct 1994 16:30:19


Just a few comments/corrections in the FAQ...

In article <character-sets/iso-8859-1-faq_780452...@rtfm.mit.edu>, m...@vlsivie.tuwien.ac.at writes:

|> Archive-name: character-sets/iso-8859-1-faq
|> Posting-Frequency: monthly
|>
|>
|>             ISO 8859-1  National Character Set FAQ
|>
...
|> 4. Getting the locale setting right.
|> For the ctype macros (and by extension, applications you are running
|> on your system) to correctly identify accented characters, you
|> may have to set the ctype locale to an ISO 8859-1 conformant
|> configuration. On SunOS this may be done by placing
|> ------------------------------------
|> setenv LANG C
|> setenv LC_CTYPE iso_8859_1
|> ------------------------------------
|> in your .login script (if you use the csh). An equivalent statement
|> will adjust the ctype locale for non-csh users.
|>
|> The process is the same for other operating systems, e.g. on HP/UX use
|> 'setenv LANG german'; on IRIX 5.2 use 'setenv LANG de'; on Ultrix 4.3
|> use 'setenv LANG GER_DE.8859' and on OSF/1 use 'setenv LANG
|> de_DE.88591'.  The examples given here are for German.  Other
|> languages work too, depending on your operating system.  Check out
|> 'man setlocale' on your system for more information.

For HP systems, your example should be LANG=german.iso88591.  Setting LANG
to "german" on HP will give you HP Roman8 proprietary codeset, not
ISO8859.1.  At least this is the case for HP-UX < 10.0.  As of 10.0, you
can use either german.iso88591 or de_DE.iso88591 (a name more in line with
other vendors and developing standards for locale names).  For a complete
listing of locale names, see the text file /usr/lib/nls/config.  Or, on
HP-UX 10.0, execute locale -a . This command will list all locales currently
installed on your system.

...
|> 9.1 US-keyboards under X11
|> Under X Windows, the COMPOSE multi-language support key can be
|> used to enter accented characters.
|> Thus, when running X11 on a SunOS-based computer (or any other X11R5
|> server supporting COMPOSE characters), you can type three character
|> sequences such as
|> COMPOSE " a -> ?
|> COMPOSE s s -> ?
|> COMPOSE ` e ->
|> to type accented characters.
|>
|> Note that this COMPOSE capability has been removed as of X11R6,
|> because it does not adequately support all the languages in the world.
|> Instead, compose processing is supposed to be performed in the client
|> using an 'input method'. (In the short term, this is a step backward,
|> as few clients support this type of processing at the moment.)

I guess "few" is a matter of perspective.  Any application written with
Motif 1.2 or greater automatically uses the R5/R6 input method APIs.
And most vendors included similar functionality in their Motif 1.0 and 1.1
implementations. (I know that HP, IBM, and DEC did this.)  And most vendors
terminal emulators for X are written to use these input methods.  Maybe I
come from an atypical background, but my experience is that there are few
significant Xlib only applications... most applications are written with
higher level toolkits, such as Motif.  And while the Athena toolkit doesn't
use input method technology, I beleive most others do.

|> Input methods are controlled by the locale environment variables (LANG
|> and LC_xxx). The values for these variables are (or at least, should be
|> made equivalent by any sane vandor) equivalent to those expected by
|> the ANSI/POSIX locale library.  For a list of possible settings see
|> section 4.

This is partially correct.  XOpenIM actually uses the value of the LC_CTYPE
category (obtained by setlocale(LC_CTYPE,NULL))to determine for which
language (actually, which codeset) an IM should be opened.  Many supported
languages support multiple IMs.  The XMODIFIERS environment variable allows
the user the control which IM for the language is selected.  Toolkits such
as Motif provide additional controls through use of resources, such as
XnlLanguage (to control language, instead of LANG or LC_CTYPE category
value).

|> 9.2 US-keyboards and emacs
|> There are several modes to enter Umlaut characters under emacs when
|> using a US-style keyboard.  One such mode is iso-transl, which is
|> distributed with the standard emacs distribution.  This mode uses the
|> Alt-key for entering diacritical marks (accents et al.).  An extended
|> iso-transl mode (iso-transl+) which allows the definition of language
|> specific short cuts is available via anonymous ftp from
|> ftp.vlsivie.tuwien.ac.at in /pub/8bit/iso-transl+.shar.  This file
|> also includes sample configurations for the German and Spanish
|> languages.
|>
|> An alternative to using Alt-sequences for entering diacritical marks
|> is the use of 'electric accents', such as used on old type writers or
|> under many MS Windows programs.  With this method, typing an accent
|> character will place this accent on the next character entered.  One
|> mode which supports this entry method is the iso-acc minor mode which
|> comes with the standard emacs distribution. Just add
|> ------------------
|> (require 'iso-acc)
|> ------------------
|> to your emacs startup script, and the '`~/^" keys will be electric
|> accents.

I think you are talking about the conflict between using Meta for both
keymap column 3 and 4 access as well as emac's need for Meta for commands.
BTW, it isn't just the US keyboards that have these problems/challanges.
Any national language keyboard may have keysyms bound to column 3 and 4 of
the keymap.  One way to get around this conflict can be found in the HP-UX
release notes (this should work for non-HP systems as well):

   Mapping keyboard for both Extend-char and Meta
   -----------------------------------------------

   A common problem reported by people using HP's X Window System is the
   conflict between the use of the "extend-char" key to access the extended
   characters of "Roman8" or "Latin1" with HP's keyboards and the use of the
   "extend-char" key as a Meta key.

   The default mapping is that both keys serve both purposes.  However, with
   HP-UX 9.* it is possible to configure the keyboard so that one key is used
   as the "extend-char" key and the other as the Meta key.

   The "xmodmap" command can be used to inquire and set the mapping for keys
   on the keyboard.  Run the following command.

       xmodmap -pm

   For a US or West European keyboard in the default state, this prints:

      xmodmap:  up to 3 keys per modifier, (keycodes in parentheses):

      shift       Shift_R (0xc),  Shift_L (0xd)
      lock        Caps_Lock (0x37)
      control     Control_L (0xe)
      mod1        Meta_R (0xa),  Meta_L (0xb),  Mode_switch (0x36)
      mod2
      mod3
      mod4
      mod5

   The "mod1" modifier has entries for both Meta "keysyms" and for
   "Mode_switch" as well; and this creates a problem.  The solution is to use
   "mod2" for Mode_switch and change the "Meta_L" key into the "Mode_switch"
   key.  To do this, use "xmodmap" and execute the following command:

       xmodmap mods

   where "mods" contains the following four lines:

      remove Mod1 = Meta_L Mode_switch
      keysym Mode_switch = NoSymbol
      keysym Meta_L = Mode_switch
      add Mod2 = Mode_switch

   The entries in the file need to be in this order.  Again, type:

      xmodmap -pm

   The results should be:

      xmodmap:  up to 3 keys per modifier, (keycodes in parentheses):

      shift       Shift_R (0xc),  Shift_L (0xd)
      lock        Caps_Lock (0x37)
      control     Control_L (0xe)
      mod1        Meta_R (0xa)
      mod2        Mode_switch (0xb)
      mod3
      mod4
      mod5

   The keyboard then uses the left "extend-char" key for extended
   characters and the right "extend-char" key for Meta.  The client must be
   linked against R4 or R5 "Xlib" for this to work.

Hope this information helps.

Best regards,

Tom McFarland
Hewlett-Packard, Co.
<to...@cv.hp.com>

 
 
 

ISO 8859-1 National Character Set FAQ

Post by Yoseff Franc » Sat, 08 Oct 1994 05:19:51


A long shot, but does anyone know where one can get Lakka in the
United States?

I would appreciate it if people could respond also via email. thanks

yf

--
In Xanadu did Kubla Khan
A stately pleasure dome decree
But only if the NFL
To a franchise would agree

 
 
 

ISO 8859-1 National Character Set FAQ

Post by Hergen Eile » Sat, 08 Oct 1994 09:29:25



Quote:

>A long shot, but does anyone know where one can get Lakka in the
>United States?

Sorry, that I can't help but I'm just curious:  What is Lakka?

Hergen

P.S. Sorry for writing this sentence but it seems as if my system doesn't
allow to post an article when the new text is shorter than the included text.
So, I hope this helps.

 
 
 

ISO 8859-1 National Character Set FAQ

Post by Patric Lundbe » Fri, 07 Oct 1994 23:57:18





> >A long shot, but does anyone know where one can get Lakka in the
> >United States?

> Sorry, that I can't help but I'm just curious:  What is Lakka?

Cloudberry liquor - the most heavenly * beverage on earth.  No I
don't where to get it on this side of the Atlantic - I always pick up a
bottle or two when I leave from a visit at home.

Sincerely, Patric.

< -------------------------------------------------- >

  UW-Madison, Mad-town USA

  Ma'let a"r ingenting, va"gen a"r allt - R. Broberg
< -------------------------------------------------- >

 
 
 

ISO 8859-1 National Character Set FAQ

Post by Bernd Wittge » Sat, 08 Oct 1994 17:44:53



|> >
|> >
|> >
|> >A long shot, but does anyone know where one can get Lakka in the
|> >United States?
|> >
|> Sorry, that I can't help but I'm just curious:  What is Lakka?
|>
|> Hergen
|>
|> P.S. Sorry for writing this sentence but it seems as if my system doesn't
|> allow to post an article when the new text is shorter than the included text.
|> So, I hope this helps.

What is Lakka ?
To scandinavians (at least), it is a liqueur made of Molte-berries.

How to get it in US? No idea.

Cheers,

Bernd

 
 
 

ISO 8859-1 National Character Set FAQ

Post by Norbert Stra » Sun, 09 Oct 1994 04:34:04




>>A long shot, but does anyone know where one can get Lakka in the
>>United States?

>Sorry, that I can't help but I'm just curious:  What is Lakka?
>Hergen

"Lakka" is the Finnish word for cloudberry, in German "Multbeere" (Rubus
Chamaemorus).
It's a member of the Rose family. A little plant that is found especially in
arctic swamps. The berry looks a little bit like a raspberry, but it's
colour is shining orange.
In Germany there are two places where cloudberries are found, one in the
Oberharz and another one in a bog southwest of Stade.I suppose that there
must be a lot of cloudberries, or their American relatives, in Canada and
Alaska.

In northern Scandinavia the cloudberry is very common. It is eaten e.g. as
fresh with cream or as jam.
In Finland you can get coudberry liquor or a cloudberry aperitif of port
wine strength.

Fresh cloudberries are sold everywhere in Swedish and Finnish markets, but
extremely expensive.

There's nothing better!

Norbert

 
 
 

ISO 8859-1 National Character Set FAQ

Post by Ville I Miettin » Sun, 09 Oct 1994 10:53:09



: >
: >
: >
: >A long shot, but does anyone know where one can get Lakka in the
: >United States?
: >
: Sorry, that I can't help but I'm just curious:  What is Lakka?

Lakka is 'cloudberry' in English.. A goldish yellow, tasty and very
expensive berry..

-ville

 
 
 

ISO 8859-1 National Character Set FAQ

Post by raymond thomas pierrehumbe » Sun, 09 Oct 1994 06:13:57


Do you mean Lakka the berry or Lakka the liqueur made from the berry?
If the former, I've never seen it in stores here.  I have found
the berry out on barrens in Canada, though (most recently
in Newfoundland; also grows in Labrador).  In Canada it's
called a bake-apple.  They don't seem to make the liqueur
commercially so far as I know.

I've never seen the liqueur in US stores, but I've never
tried very hard to get my local store to order it.  If
you tell your wine shop the distributor is Chymos, they
might be able to help.

By the way if you like berry liqueurs, you should try
mesimarja.  In my opinion, the finest of all berries,
and the very finest of all liqueurs.

.

 
 
 

ISO 8859-1 National Character Set FAQ

Post by Katinka Zbi » Sun, 09 Oct 1994 06:07:00



: >
: Sorry, that I can't help but I'm just curious:  What is Lakka?

lakka is finnish liqueur made out of salmonberries or also called
moltberries.

heippa
rakkaus

 
 
 

ISO 8859-1 National Character Set FAQ

Post by Bjorn Stenbe » Sat, 08 Oct 1994 17:18:56



: >
: >A long shot, but does anyone know where one can get Lakka in the
: >United States?
: >
: Sorry, that I can't help but I'm just curious:  What is Lakka?

: Hergen

: P.S. Sorry for writing this sentence but it seems as if my system doesn't
: allow to post an article when the new text is shorter than the included text.
: So, I hope this helps.

Hergen,

Lakka is, as far as I know, a Finnish liqueur flavoured with wild Scandinavian berries.
It is quite good; but, what this posting is doing in this newsgroup I don't know.

Perhaps somebody else is more enlightened.

Bjorn

 
 
 

ISO 8859-1 National Character Set FAQ

Post by Katinka Zbi » Mon, 10 Oct 1994 19:28:57



: By the way if you like berry liqueurs, you should try
: mesimarja.  In my opinion, the finest of all berries,
: and the very finest of all liqueurs.

what's about polar? :-)

still dreaming of finland
rakkaus

 
 
 

ISO 8859-1 National Character Set FAQ

Post by Timo Sal » Tue, 11 Oct 1994 01:24:00



:: By the way if you like berry liqueurs, you should try
:: mesimarja.  In my opinion, the finest of all berries,
:: and the very finest of all liqueurs.
:
:what's about polar? :-)
:still dreaming of finland

That's very nice and commendable even with the small f :-), but
hardly very Unixish even if Linux originates from Finland.

Followups to soc.culture.nordic if we all pretty please!

   All the best, Timo

..................................................................
Prof. Timo Salmi      Co-moderator of comp.archives.msdos.announce
Moderating at garbo.uwasa.fi anonymous FTP  archives  128.214.87.1
Faculty of Accounting & Industrial Management; University of Vaasa

 
 
 

ISO 8859-1 National Character Set FAQ

Post by Heikki Raudaskos » Tue, 11 Oct 1994 04:38:56


: Sorry, that I can't help but I'm just curious:  What is Lakka?

You've heard already that the "Lakka" liqueur tastes of cloudberries,
berries that in Finland grow especially in Lapland, and in Oulu province,
too.

As a northern Finn I have, however, to say that we native northerners
don't call that berry "lakka"; for us it is "hilla."

"Lakka" is a more "sophisticated", official, southerner synonym for "hilla".
For us it is a tasteless, scentless, lifeless word.

So: everytime you see the word "lakka" used in any context, cultural
imperialism and regional suppression is happening before your own eyes.

(BTW, when I began to hang around the bars at the age of 16, my favorite
drink was 2 cl "Lakka" mixed with 2 cl whiskey. I haven't tried it for
ages, I wonder how it would taste like now. But you could try it even if
you've passed your *s.)      

: Hergen

Heikki