Safe strings library in C?

Safe strings library in C?

Post by Michael Thaye » Tue, 22 Sep 1998 04:00:00



A while ago, there was a discussion of the problems of writing safe
applications in C, with particular reference to the problems of the
standard C string library.  Does anyone have views on what a better string
library in C might look like?  I had the idea of a structure like a Pascal
string with a pointer to the bytes in it and a flag to say if they can be
realloc'ed, and a set of functions similar to the existing ones for these
string structures.

Michael

 
 
 

Safe strings library in C?

Post by Johan Kullsta » Thu, 24 Sep 1998 04:00:00



> A while ago, there was a discussion of the problems of writing safe
> applications in C, with particular reference to the problems of the
> standard C string library.  Does anyone have views on what a better string
> library in C might look like?  I had the idea of a structure like a Pascal
> string with a pointer to the bytes in it and a flag to say if they can be
> realloc'ed, and a set of functions similar to the existing ones for these
> string structures.

i don't think it's really possible in C.  as someone else pointed out
earlier, C has no arrays or strings.  it has pointers and blocks of
memory but that's not quite the same thing.

--


 
 
 

Safe strings library in C?

Post by Richard Jone » Fri, 25 Sep 1998 04:00:00


: A while ago, there was a discussion of the problems of writing safe
: applications in C, with particular reference to the problems of the
: standard C string library.  Does anyone have views on what a better string
: library in C might look like?  I had the idea of a structure like a Pascal
: string with a pointer to the bytes in it and a flag to say if they can be
: realloc'ed, and a set of functions similar to the existing ones for these
: string structures.

You could have a look at bounds checking GCC (see my homepage).
If you compile your programs with this modified version of gcc,
then you'll get safe checking for strings and other memory
objects. You could also look at the attached library I wrote.
It doesn't do `safe' strings, but encompasses a number of the
ideas you mention above and is safe to use for reading arbitrarily
long strings from files and other external sources.

Rich.

--------- faststring.h --------------------
/*      -*- C -*-
 *
 *      A small fast variable-length string library for C.
 *      Copyright (C) 1998 Richard W.M. Jones.
 *
 *      This program is free software; you can redistribute it and/or modify
 *      it under the terms of the GNU General Public License as published by
 *      the Free Software Foundation; either version 2 of the License, or
 *      (at your option) any later version.
 *
 *      This program is distributed in the hope that it will be useful,
 *      but WITHOUT ANY WARRANTY; without even the implied warranty of
 *      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *      GNU General Public License for more details.
 *
 *      You should have received a copy of the GNU General Public License
 *      along with this program; if not, write to the Free Software
 *      Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307,
 *      USA.
 */

#ifndef __faststring_h__
#define __faststring_h__

#include <string.h>

struct string {
  union {
    int length;                 /* Actual length of string. */
    struct string *next;        /* When on ``old_strings'' list, this
                                 * stores the next pointer.
                                 */
  } u;
  int allocated;                /* Nr. bytes allocated to string. */
  char *bytes;                  /* Bytes of string (not necessarily
                                 * NUL-terminated).
                                 */

Quote:};

static struct string *old_strings = 0;
static int old_strings_size = 0;

#ifndef STRING_MAX_STRING_CACHE
#define STRING_MAX_STRING_CACHE (32*1024)
#endif /* STRING_MAX_STRING_CACHE */

static inline struct string *
string_init (void)
{
  struct string *s;

  /* If there is a string in the cache, pull it out and
   * use it.
   */
  if (old_strings != 0)
    {
      s = old_strings;
      old_strings = s->u.next;
      old_strings_size -= s->allocated;
    }
  /* Otherwise, allocate a new string. */
  else
    {
      s = malloc (sizeof (struct string));
      if (s == 0) {
        perror ("string: malloc");
        exit (1);
      }
      s->allocated = 0;
      s->bytes = 0;
    }
  s->u.length = 0;
  return s;

Quote:}

static inline struct string *
string_init_string (const struct string *s)
{
  struct string *t = malloc (sizeof (struct string));
  if (t == 0) {
    perror ("string: malloc");
    exit (1);
  }
  t->bytes = malloc (s->u.length * sizeof (char));
  if (t->bytes == 0) {
    perror ("string: malloc");
    exit (1);
  }
  t->allocated = t->u.length = s->u.length;
  memcpy (t->bytes, s->bytes, s->u.length);
  return t;

Quote:}

static inline struct string *
string_init_chars (const char *s)
{
  struct string *t = malloc (sizeof (struct string));
  if (t == 0) {
    perror ("string: malloc");
    exit (1);
  }
  t->allocated = t->u.length = strlen (s);
  t->bytes = malloc (t->u.length * sizeof (char));
  if (t->bytes == 0) {
    perror ("string: malloc");
    exit (1);
  }
  memcpy (t->bytes, s, t->u.length);
  return t;

Quote:}

static inline void
string_free (struct string *s)
{
  /* If the cache is smaller than the maximum size, then
   * push this string into the cache.
   */
  if (old_strings_size + s->allocated <= STRING_MAX_STRING_CACHE)
    {
      s->u.next = old_strings;
      old_strings = s;
      old_strings_size += s->allocated;
    }
  /* Otherwise free the string outright. */
  else
    {
      free (s->bytes);
      free (s);
    }

Quote:}

static inline void
string_append_char (struct string *s, char c)
{
  if (s->u.length < s->allocated)
    {
    do_append:
      s->bytes [s->u.length] = c;
      s->u.length ++;
    }
  else
    {
      s->bytes = realloc (s->bytes,
                          s->allocated = (s->allocated == 0
                                          ? 1
                                          : s->allocated * 2)
                                       * sizeof (char));
      if (s->bytes == 0) {
        perror ("string: realloc");
        exit (1);
      }
      goto do_append;
    }

Quote:}

static inline void
string_truncate (struct string *s)
{
  s->u.length = 0;

Quote:}

static inline const char *
string_chars (struct string *s)
{
  /* NUL-terminate the string, then adjust its length back
   * to normal and return the bytes.
   */
  string_append_char (s, '\0');
  s->u.length --;
  return s->bytes;

Quote:}

static inline char *
string_copy_chars (struct string *s)
{
  return strdup (string_chars (s));

Quote:}

static inline int
string_length (const struct string *s)
{
  return s->u.length;

Quote:}

typedef struct string *string;

#endif /* __faststring_h__ */
--------- end of faststring.h --------------------

--
-      Richard Jones. Linux contractor London and SE areas.        -
-    Very boring homepage at: http://www.annexia.demon.co.uk/      -
- You are currently the 1,991,243,100th visitor to this signature. -
-    Original message content Copyright (C) 1998 Richard Jones.    -

 
 
 

Safe strings library in C?

Post by Rudolf Leitg » Fri, 25 Sep 1998 04:00:00





>: A while ago, there was a discussion of the problems of writing safe
>: applications in C, with particular reference to the problems of the
>: standard C string library.  Does anyone have views on what a better string
>: library in C might look like?  I had the idea of a structure like a Pascal
>: string with a pointer to the bytes in it and a flag to say if they can be
>: realloc'ed, and a set of functions similar to the existing ones for these
>: string structures.

I sounds like the Libretto libraries are what you are looking for.
Check them out at

        http://semantics.soas.ac.uk/~aaron/tech/libretto/

Good luck

Rudi

--

         | | | | |
       \   _____   /      
          /     \                      B O R N
      -- | o   o |  --                   T O
      -- |       |  --                S L E E P
      -- | \___/ |  --                   I N
          \_____/                   T H E   S U N
        /          \    
         | | | | |

 
 
 

Safe strings library in C?

Post by nathan wagn » Fri, 25 Sep 1998 04:00:00




> i don't think it's really possible in C.  as someone else pointed out
> earlier, C has no arrays or strings.

Then whoever pointed it out doesn't know what he's talking about.  An array
is not a pointer.  See the comp.lang.c faq.  It sort of has strings, in as
much as the language requires compilers to do something with double quoted
strings in the source.

Quote:> it has pointers [...] but that's not quite the same thing.

Indeed, fortunately C has both.

--
nathan wagner          Who knows what evil lurks in the hearts of men?

 
 
 

Safe strings library in C?

Post by Tim Smi » Sun, 27 Sep 1998 04:00:00



Quote:>standard C string library.  Does anyone have views on what a better string
>library in C might look like?  I had the idea of a structure like a Pascal

Do you really have to stay with C?  If not, use C++ as if it were C, and then
use the standard C++ string library.  C programs usually work fine when
compiled with a C++ compiler, and those that don't usually only take minor
tweaking to get working.

--Tim Smith

 
 
 

Safe strings library in C?

Post by Bjorn Ree » Wed, 30 Sep 1998 04:00:00



> static inline char *
> string_copy_chars (struct string *s)
> {
>   return strdup (string_chars (s));
> }

I just wanted to point out that strdup() is not defined by ANSI,
and some older Unix platforms does not support it. It is more
portable to use a malloc/strcpy combination.
 
 
 

Safe strings library in C?

Post by Tristan Wibberle » Fri, 02 Oct 1998 04:00:00




> > A while ago, there was a discussion of the problems of writing safe
> > applications in C, with particular reference to the problems of the
> > standard C string library.  Does anyone have views on what a better string
> > library in C might look like?  I had the idea of a structure like a Pascal
> > string with a pointer to the bytes in it and a flag to say if they can be
> > realloc'ed, and a set of functions similar to the existing ones for these
> > string structures.

> i don't think it's really possible in C.  as someone else pointed out
> earlier, C has no arrays or strings.  it has pointers and blocks of
> memory but that's not quite the same thing.

C is turing complete... Anything is possible.

For example, it is likely that your pascal compiler was written in C (at
least at the beginning of it's development), and the strings
implementation was therefore written in C. So C will let a library do
strings.

static struct string_tracker track_strings;  /* To allow the library to
                                              * ensure that strings have
                                              * been initialised */

typedef struct String_section {
  char *this_string_section;
  struct String_section *prev_string_section, *next_string_section;

Quote:} Safe_string;

Then provide functions to manipulate (and create/initialise/destroy)
Safe_strings.

With such a library, you can only go wrong if you try to manipulate them
directly - which is easy to avoid.

--
Tristan Wibberley

 
 
 

Safe strings library in C?

Post by Richard Jone » Fri, 02 Oct 1998 04:00:00


:>

:>
:> > A while ago, there was a discussion of the problems of writing safe
:> > applications in C, with particular reference to the problems of the
:> > standard C string library.  Does anyone have views on what a better string
:> > library in C might look like?  I had the idea of a structure like a Pascal
:> > string with a pointer to the bytes in it and a flag to say if they can be
:> > realloc'ed, and a set of functions similar to the existing ones for these
:> > string structures.
:>
:> i don't think it's really possible in C.  as someone else pointed out
:> earlier, C has no arrays or strings.  it has pointers and blocks of
:> memory but that's not quite the same thing.

: C is turing complete... Anything is possible.

: For example, it is likely that your pascal compiler was written in C (at
: least at the beginning of it's development), and the strings
: implementation was therefore written in C. So C will let a library do
: strings.
[...]

I think there are two issues here.

(1) Is it possible, without changing the source, to replace
    ordinary ASCIIZ strings in C to (length, string) tuples?

    ie. const char *p = "Some string";

    would cause the compiler to allocate ``p'' as a pointer
    to a hidden structure:

        struct {
          int length;
          char *string;
        };

    with string pointing to the actual text of the string
    (and length == 11).

    Simple answer: no. Too many real C programs depend on
    the exact format of the string. For example, they like
    to iterate over the characters loops like this:

        while (*p) { do something with *p; p++; }

    More complex answer: maybe. It may be possible for
    the compiler to rewrite all the references through
    p, but it'd be very tricky to implement. (You'd need
    an extra hidden field in the above structure, and
    you'd need to expand your concept of a pointer to
    a structure with 3 * wordsize bytes).

(2) Is it possible to bounds check strings, even though
    C really just regards them as arrays of characters,
    and even though the only handle you have to them is
    just a pointer to a character?

    Answer: yes. See my homepage for an existence proof.

Rich.

--
-      Richard Jones. Linux contractor London and SE areas.        -
-    Very boring homepage at: http://www.annexia.demon.co.uk/      -
- You are currently the 1,991,243,100th visitor to this signature. -
-    Original message content Copyright (C) 1998 Richard Jones.    -

 
 
 

Safe strings library in C?

Post by Rick Walk » Fri, 02 Oct 1998 04:00:00


:     Simple answer: no. Too many real C programs depend on
:     the exact format of the string. For example, they like
:     to iterate over the characters loops like this:

:       while (*p) { do something with *p; p++; }

How about allocating safestrings like:

    char * alloc_safe_string(size)
    int size;
    {
        char *p;

        p = (char *) malloc((unsigned) size + sizeof(int));

        p[0]=size;              /* stuffing an int into memory just before*/
                                /* the string part of the allocated space */

        return p+sizeof(int);   /* increment return value to hide size from */
                                /* legacy code */
    }

So legacy code simply thinks of this as a regular string.  Cooperative
code knows that the 4 bytes prior to p hold the size of the string:

    [ size ] [ a ] [ b ] [ c ] [\0]
       ^       ^
       |       |
      p-4      p

This way things like printf() would not need to be modified, while things
like strcopy() could be built in a safeway.

--
Rick Walker

 
 
 

Safe strings library in C?

Post by Peter Samuels » Fri, 02 Oct 1998 04:00:00



Quote:> How about allocating safestrings like:
[...]
>         return p+sizeof(int);
> So legacy code simply thinks of this as a regular string.
> Cooperative code knows that the 4 bytes prior to p hold the size of
> the string:

Nope, won't work.  I thought of that, but the problem is that this only
works for static read-only strings.  Any string dynamically allocated
will, sooner or later, be overwritten by legacy code somewhere, which
will not update the "size" field.  When your "size" field disagrees
with the \0 it will do interesting things to your "safe string
handling" library.

The way to do this is to use an actual struct { int,char* } instead of
trying to fake it with a char*.  It need not be overtly object-oriented
(though that would be the obvious way, restricting access to the string
data that doesn't go through methods), but it *does* need to be clearly
backwards-incompatible.

While you're at it with a safer string library, why not optimize a
little?  For instance, a "dirty bit" boolean in the struct that says
that "length" is incorrect and that it would have been too much work to
recalculate it.  For some usages this could be a win.  Also you would
want to store "allocated length" so that you don't necessarily have to
call realloc() just to extend the string by a couple bytes.

--
Peter Samuelson
<sampo.creighton.edu!psamuels>

 
 
 

Safe strings library in C?

Post by Lars Hofhans » Sat, 03 Oct 1998 04:00:00




> > A while ago, there was a discussion of the problems of writing safe
> > applications in C, with particular reference to the problems of the
> > standard C string library.  Does anyone have views on what a better string
> > library in C might look like?  I had the idea of a structure like a Pascal
> > string with a pointer to the bytes in it and a flag to say if they can be
> > realloc'ed, and a set of functions similar to the existing ones for these
> > string structures.

> i don't think it's really possible in C.  as someone else pointed out
> earlier, C has no arrays or strings.  it has pointers and blocks of
> memory but that's not quite the same thing.

> --


Excuse me? Most other compiler and interpreters (Java, etc) are written
in C. How do you make those languages "string-safe" then?

Machine code also has no arrays or strings, yet all programs are
finally executed as machine code, even those that have safe
strings.

As for the original poster. If you must use C you probably have to
write a safe-string library by yourself (which depending on what
you need, might be not too difficult). If instead you can use
C++ you may use the string class from the Standard Template Library
which is quite handy.

Cheers,

        Lars

--
Legal Warning: Anyone sending me unsolicited/commercial email
WILL be charged a $100 proof-reading fee. See US Code Title 47,
Sec.227(a)(2)(B), Sec.227(b)(1)(C) and Sec.227(b)(3)(C).

 
 
 

Safe strings library in C?

Post by Michael Thaye » Sat, 03 Oct 1998 04:00:00



> Legal Warning: Anyone sending me unsolicited/commercial email
> WILL be charged a $100 proof-reading fee. See US Code Title 47,
> Sec.227(a)(2)(B), Sec.227(b)(1)(C) and Sec.227(b)(3)(C).

Does that actually work?  Perhaps if enough people did that, we might see
a big drop in spam.  (Actually, I've been getting relatively little lately
- perhaps that's the reason).

Michael

 
 
 

1. Unix Libraries Ins/Outs

Can someone point me to some basic information on unix shared and static
libraries for someone who's just starting out developing libs for unix?
What do they all mean?

--
    _ ___  _
   | |   \| |     Jeff Saenz
   | | |  | |     Jet Propulsion Laboratory
 __| |  _/| |__   Pasadena, CA 91101
|____|_|  |____|

2. Newbie questions on Upgrade Paths

3. Will using C++ strings be MT-safe and re-entrant ?

4. open files in core dump?

5. Thread Safe STL (std::string) in FreeBSD 4.0??

6. Linux 2.1.122 and U34F driver bug report

7. How to make a thread safe library

8. Fifo's

9. List of non-thread-safe SunOS library functions?

10. LDAP library MT-Safe?

11. MT-safe Xt library?

12. Are Informix libraries MT safe ?

13. Making libraries thread-safe