Help needed, Unicode and database to support multiple language

Help needed, Unicode and database to support multiple language

Post by Xiao » Wed, 06 Dec 2000 04:00:00



        One of projects needs to use UDB to support multiple languages.    I have used
both UDB 6.1 and 7.1.    The command I
used are:

         db2 CREATE DATABASE franklan USING CODESET UTF-8 TERRITORY US

and create a single column with data type of varchar of size 2000.

                db2 "create table main (keyword varchar(2000))"

        I am assuming that the database can handle the unicode.   I wrote a Java
program and use JDBC to connect to database.    I try to insert some
unicode string into the database.   I found that I can properly insert the
unicode smaller than '00FF'.    For other unicodes,   the content in the
database
is not same as what I want to insert into.   Furthermore, if I select the
content from database, the result even not same as what store in database.

 
 
 

Help needed, Unicode and database to support multiple language

Post by Ed Vassie, BMC Software Lt » Thu, 07 Dec 2000 04:00:00


You need to use GRAPHIC strings when processing a double-byte character
set such as Unicode.

For your application, use VARGRAPHIC(2000), not VARCHAR(2000)

Also, consider carefully if you need to use varying-length fields.  If
the actual values you are going to store will all be, for example 25 to
30 characters long, it would be best to use GRAPHIC(30), not VARGRAPHIC
(30) or VARGRAPHIC(2000).  On the other hand, if the length of data
could be anything up to 2000 chars long, then VARGRAPHIC(2000) remains
the best option.

--
All information given is a personal opinion.
You are responsible for any changes to your environment.

Sent via Deja.com http://www.deja.com/
Before you buy.

 
 
 

Help needed, Unicode and database to support multiple language

Post by Mitc » Thu, 07 Dec 2000 04:00:00


Ed, are you absolutely sure about the GRAPHIC/VARGRAPHIC bit? I've done
this on a UTF-8 database and not had to change from CHAR/VARCHAR !

--
Richard Mitchell
Architect
National e-business Practice
IBM Global Services
Certified Solutions Expert - DB2 UDB v6.1 Database Administration for
UNIX, Windows and OS/2



Quote:> You need to use GRAPHIC strings when processing a double-byte
character
> set such as Unicode.

> For your application, use VARGRAPHIC(2000), not VARCHAR(2000)

> Also, consider carefully if you need to use varying-length fields.  If
> the actual values you are going to store will all be, for example 25
to
> 30 characters long, it would be best to use GRAPHIC(30), not
VARGRAPHIC
> (30) or VARGRAPHIC(2000).  On the other hand, if the length of data
> could be anything up to 2000 chars long, then VARGRAPHIC(2000) remains
> the best option.

> --
> All information given is a personal opinion.
> You are responsible for any changes to your environment.

> Sent via Deja.com http://www.deja.com/
> Before you buy.

Sent via Deja.com http://www.deja.com/
Before you buy.
 
 
 

Help needed, Unicode and database to support multiple language

Post by Mitc » Thu, 07 Dec 2000 04:00:00


Ed, are you absolutely sure about the GRAPHIC/VARGRAPHIC bit? I've done
this on a UTF-8 database and not had to change from CHAR/VARCHAR !

--
Richard Mitchell
Architect
National e-business Practice
IBM Global Services
Certified Solutions Expert - DB2 UDB v6.1 Database Administration for
UNIX, Windows and OS/2



Quote:> You need to use GRAPHIC strings when processing a double-byte
character
> set such as Unicode.

> For your application, use VARGRAPHIC(2000), not VARCHAR(2000)

> Also, consider carefully if you need to use varying-length fields.  If
> the actual values you are going to store will all be, for example 25
to
> 30 characters long, it would be best to use GRAPHIC(30), not
VARGRAPHIC
> (30) or VARGRAPHIC(2000).  On the other hand, if the length of data
> could be anything up to 2000 chars long, then VARGRAPHIC(2000) remains
> the best option.

> --
> All information given is a personal opinion.
> You are responsible for any changes to your environment.

> Sent via Deja.com http://www.deja.com/
> Before you buy.

Sent via Deja.com http://www.deja.com/
Before you buy.
 
 
 

Help needed, Unicode and database to support multiple language

Post by Xiao » Thu, 07 Dec 2000 04:00:00


I am the original person posted the question.  What I have found is that the
varchar can handle properly if I use JDBC to send a string which only contains
the unicodes at the range of 0000-00FF.  For any unicode beyond that range,
either the database or JDBC can not transfer them properly into utf-8.

Nianjun Zhou


 
 
 

Help needed, Unicode and database to support multiple language

Post by Mitc » Fri, 08 Dec 2000 12:01:06


After doing some research on this, I found...

"The Unicode character encoding standard is a fixed-length, character-
encoding scheme that includes characters from almost all the living
languages of the world. Unicode characters are usually shown as U+xxxx,
where xxxx is the hexadecimal code of the character. Each character is
16 bits (2 bytes) wide regardless of the language. While the resulting
65,536 code elements are sufficient for encoding most of the characters
of the major languages of the world, the Unicode standard also provides
an extension mechanism that allows for encoding as many as a million
more characters. This extension reserves a range of code values (U+D800
to U+D8FF, known as surrogates) for encoding some 32-bit characters as
two successive code elements.
DB2 supports ISO/IEC 10646 standard UCS-2, that is, Unicode without
surrogates. UCS-2 is implemented with UTF-8 (UCS Transformation Format
8) algorithmic transformation. DB2 supported codepage/CCSIDs are shown
in Table 5-3.

Table 5-3 Supported Code Pages/CCSIDs

CP/CCSID  Single-Byte (SBCS) Space    Double-Byte (DBCS) Space
1200      N/A                         U+0020
13488     N/A                         U+0030
These are handled the same way except for the value of their DBCS
space. Regarding the conversion table, since code page 1200 is a super
set of CCSID 13488, the exact same tables are used for both.
UTF-8 has been registered as CCSID 1208, which is used as the multibyte
(MBCS) code page number for the UCS-2/UTF-8 support of DB2. This is the
database code page number and the code page of character string data
within the database.The double-byte code page number (for UCS-2) is
1200, which is the code page of graphic string data within the
database.

When a database is created in UCS-2/UTF-8, CHAR, VARCHAR, LONG VARCHAR,
and CLOB data are stored in UTF-8, and GRAPHIC, VARGRAPHIC, LONG
VARGRAPHIC, and DBCLOB data are stored in UCS-2. We will simply refer
to this as a UCS-2 database.
If you are working with character string data in UTF-8, you should be
aware that ASCII characters are encoded into 1-byte lengths; however,
non-ASCII characters are encoded into 2- or 3-byte lengths in a
multiple-byte character code set (MBCS). Therefore, if you define an n-
byte length character column, you can store strings anywhere from n/3
to n characters depending on the ratio of ASCII to non-ASCII
characters. "

I think I stand corrected..... my humble apologies.

Regards
Richard Mitchell

Architect
IBM e-business Practice
IBM Global Services
Certified Solutions Expert - DB2 UDB v6.1 Database Administration for
UNIX, Windows and OS/2



> Ed, are you absolutely sure about the GRAPHIC/VARGRAPHIC bit? I've
done
> this on a UTF-8 database and not had to change from CHAR/VARCHAR !

> --
> Richard Mitchell
> Architect
> National e-business Practice
> IBM Global Services
> Certified Solutions Expert - DB2 UDB v6.1 Database Administration for
> UNIX, Windows and OS/2



> > You need to use GRAPHIC strings when processing a double-byte
> character
> > set such as Unicode.

> > For your application, use VARGRAPHIC(2000), not VARCHAR(2000)

> > Also, consider carefully if you need to use varying-length fields.
If
> > the actual values you are going to store will all be, for example 25
> to
> > 30 characters long, it would be best to use GRAPHIC(30), not
> VARGRAPHIC
> > (30) or VARGRAPHIC(2000).  On the other hand, if the length of data
> > could be anything up to 2000 chars long, then VARGRAPHIC(2000)
remains
> > the best option.

> > --
> > All information given is a personal opinion.
> > You are responsible for any changes to your environment.

> > Sent via Deja.com http://www.deja.com/
> > Before you buy.

> Sent via Deja.com http://www.deja.com/
> Before you buy.

Sent via Deja.com http://www.deja.com/
Before you buy.
 
 
 

1. Need to upgrade a non-unicode database to unicode database

I need to upgrade a non-unicode data types in the database
to unicode data types. My database has around 450 tables,
each having lots of char, varchar and text data types.
Need to convert them to nchar, nvarchar and ntext data
types. I have a script to do this, but that takes up lots
of efforts and time. Is there any tool or some other
alternative, that takes care of complete database.

2. Problem with Masked Edit Control 6.0

3. Full-text index and Multiple Languages: Do I need a Server for each language

4. ODBC and XBase

5. Asian Language/Unicode support

6. ADO SQL Search That Returns 0 Records When it Should Return 1

7. Language Support (unicode, etc)

8. REPOST: What the Query Rewrite

9. search on image column with multiple unicode language

10. Multiple language support

11. supporting multiple languages in SQLServer 7/2000

12. multiple language support

13. Multiple language support error