Behaviour of system calls when multibyte characters are passed

Behaviour of system calls when multibyte characters are passed

Post by mshet » Thu, 10 Jun 2004 02:09:35



Hi,

What would be the behaviour if a path containing multibyte characters
is passed to functions like execl that expect a const char*?

Would it be handled properly especially if the multibyte character
string embeds a '\0'.

Is this documented somewhere?

Please help.

Thanks and Regards,
M Shetty

 
 
 

Behaviour of system calls when multibyte characters are passed

Post by Barry Margoli » Thu, 10 Jun 2004 02:34:08




> Hi,

> What would be the behaviour if a path containing multibyte characters
> is passed to functions like execl that expect a const char*?

> Would it be handled properly especially if the multibyte character
> string embeds a '\0'.

> Is this documented somewhere?

Unix filesystem system calls expect char*.  If you pass a wchar* to
them, they'll treat it as a pointer to an null-terminated string of
single-byte characters.  If there's an embedded 0 byte, it will
terminate the pathname.

In fact, it might not even work at all.  The C language allows pointer
formats to differ depending on the type they point to, so a char* and
wchar* might have totally different formats.  There's no guarantee that
using the wchar* as a char* will result in the same bytes being accessed.

--

Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

 
 
 

Behaviour of system calls when multibyte characters are passed

Post by Bart Smaalde » Thu, 10 Jun 2004 13:39:27


Note the difference between multibyte and widechar encodings...
Widechar is used internally to apps only for efficiency; system calls, etc,
files, etc need multibyte.  Supported multibyte encodings
don't contain null bytes and may be passed to open, exec, etc.
Same w/ utf-8...

In the extensive attributes(5) man page see the section on
Code Set Independence (CSI) for more details.

- Bart




> > Hi,

> > What would be the behaviour if a path containing multibyte characters
> > is passed to functions like execl that expect a const char*?

> > Would it be handled properly especially if the multibyte character
> > string embeds a '\0'.

> > Is this documented somewhere?

> Unix filesystem system calls expect char*.  If you pass a wchar* to
> them, they'll treat it as a pointer to an null-terminated string of
> single-byte characters.  If there's an embedded 0 byte, it will
> terminate the pathname.

> In fact, it might not even work at all.  The C language allows pointer
> formats to differ depending on the type they point to, so a char* and
> wchar* might have totally different formats.  There's no guarantee that
> using the wchar* as a char* will result in the same bytes being accessed.

 
 
 

1. Unicode characters to multibyte characters.

I have a Java application that is passing (through a socket) Unicode Chinese
characters to a native program on Solaris 8. When I try and do a conversion
from the Unicode characters to mulitbyte characters so that they can be used
to set a window title under X, the mbstowcs call is always failing with a -1
return code (invalid character encountered). I created the following code
sample to illustrate the problem. This code sample runs without error on AIX
and gives the correct result. It prints the correct Chinese character and
the multibyte hex code A6 40 for the character. But on Solaris, it fails.
The Unicode character that I am using 0x5171 is one of the Unicode
characters that are being passed from Java. It is a valid Unicode character
(at least it displays correctly on AIX) Does anyone have a clue why it is
failing on Solaris?

Thanks,

Mark Kressin

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wchar.h>
#include <limits.h>
int main(void)
{
      int i;
      wchar_t wc=0x5171;
      char s[MB_LEN_MAX];
      int n;

      // Clear out the output array
      for(i=0;i<10;i++)
          s[i]=0x00;

     // Pickup the current locale
      setlocale(LC_ALL,"");

      // Try and convert the character
      n = wctomb(s,wc);

     // Check the result
      if(n==-1)
          printf("failure\n");
      else
          printf("%s %x %x\n",s,s[0],s[1]);

2. IPTABLES - Listing POSTROUTING Rules Problem

3. mbstowcs : Invalid multibyte sequence (MultiByte to WideCharacter Conversion Problem)

4. Tar utility for WindowsNT

5. Using multibyte character strings on Linux

6. Alpha & LINUX SCSI questions. Urgent answers needed.

7. Display web docs containing multibyte characters

8. gawk sub() bomb

9. Non-UTF-8 multibyte characters?

10. ksh: invalid multibyte character Help!

11. multibyte characters

12. ksh: invalid multibyte character Help!