After doing some research on this, I found...
"The Unicode character encoding standard is a fixed-length, character-
encoding scheme that includes characters from almost all the living
languages of the world. Unicode characters are usually shown as U+xxxx,
where xxxx is the hexadecimal code of the character. Each character is
16 bits (2 bytes) wide regardless of the language. While the resulting
65,536 code elements are sufficient for encoding most of the characters
of the major languages of the world, the Unicode standard also provides
an extension mechanism that allows for encoding as many as a million
more characters. This extension reserves a range of code values (U+D800
to U+D8FF, known as surrogates) for encoding some 32-bit characters as
two successive code elements.
DB2 supports ISO/IEC 10646 standard UCS-2, that is, Unicode without
surrogates. UCS-2 is implemented with UTF-8 (UCS Transformation Format
8) algorithmic transformation. DB2 supported codepage/CCSIDs are shown
in Table 5-3.
Table 5-3 Supported Code Pages/CCSIDs
CP/CCSID Single-Byte (SBCS) Space Double-Byte (DBCS) Space
1200 N/A U+0020
13488 N/A U+0030
These are handled the same way except for the value of their DBCS
space. Regarding the conversion table, since code page 1200 is a super
set of CCSID 13488, the exact same tables are used for both.
UTF-8 has been registered as CCSID 1208, which is used as the multibyte
(MBCS) code page number for the UCS-2/UTF-8 support of DB2. This is the
database code page number and the code page of character string data
within the database.The double-byte code page number (for UCS-2) is
1200, which is the code page of graphic string data within the
database.
When a database is created in UCS-2/UTF-8, CHAR, VARCHAR, LONG VARCHAR,
and CLOB data are stored in UTF-8, and GRAPHIC, VARGRAPHIC, LONG
VARGRAPHIC, and DBCLOB data are stored in UCS-2. We will simply refer
to this as a UCS-2 database.
If you are working with character string data in UTF-8, you should be
aware that ASCII characters are encoded into 1-byte lengths; however,
non-ASCII characters are encoded into 2- or 3-byte lengths in a
multiple-byte character code set (MBCS). Therefore, if you define an n-
byte length character column, you can store strings anywhere from n/3
to n characters depending on the ratio of ASCII to non-ASCII
characters. "
I think I stand corrected..... my humble apologies.
Regards
Richard Mitchell
Architect
IBM e-business Practice
IBM Global Services
Certified Solutions Expert - DB2 UDB v6.1 Database Administration for
UNIX, Windows and OS/2
> Ed, are you absolutely sure about the GRAPHIC/VARGRAPHIC bit? I've
done
> this on a UTF-8 database and not had to change from CHAR/VARCHAR !
> --
> Richard Mitchell
> Architect
> National e-business Practice
> IBM Global Services
> Certified Solutions Expert - DB2 UDB v6.1 Database Administration for
> UNIX, Windows and OS/2
> > You need to use GRAPHIC strings when processing a double-byte
> character
> > set such as Unicode.
> > For your application, use VARGRAPHIC(2000), not VARCHAR(2000)
> > Also, consider carefully if you need to use varying-length fields.
If
> > the actual values you are going to store will all be, for example 25
> to
> > 30 characters long, it would be best to use GRAPHIC(30), not
> VARGRAPHIC
> > (30) or VARGRAPHIC(2000). On the other hand, if the length of data
> > could be anything up to 2000 chars long, then VARGRAPHIC(2000)
remains
> > the best option.
> > --
> > All information given is a personal opinion.
> > You are responsible for any changes to your environment.
> > Sent via Deja.com http://www.deja.com/
> > Before you buy.
> Sent via Deja.com http://www.deja.com/
> Before you buy.
Sent via Deja.com http://www.deja.com/
Before you buy.