O_APPEND reinvented

O_APPEND reinvented

Post by Sony Anto » Wed, 18 Sep 2002 07:18:10



Till recently I was under the impression that the flag O_APPEND is
used for opeing a file so that the file pointer will be at the end of
the file.
But it looks like it s actually a property of the the succeeding
write() s, whereby whenever a write() is done of teh file descriptor,
*atomically* the file pointer will be moved to the end of the file and
the data appended. ( Not only is the seeking and writing atomic, but
also will be will be done in an exclusively "locked" fashion, so that
any other process writing to the same file, opened with O_APPEND, will
not write over one another's data )

This seems to be true even if some totally unrelated process has the
file opened for writing.
Sonsider the following sequence of events, in a multi processor
machine.
1.process A opened teh file O_APPEND
2. Process B also opens with O_APPEND
3. Process A write() s
4. Process B also write() s

Process B s data will be appended after that of process A.

If this is true ( somebody correct me if I m wrong ), I m sure there
are many people out there who shared this misconcept with me ( that
O_APPEND is just to open a file so that the file pointer will be at
the end of it after opening ).

--sony

 
 
 

O_APPEND reinvented

Post by Andreas K?h?r » Wed, 18 Sep 2002 08:05:02


Submitted by "Sony Antony" to comp.unix.programmer:

Quote:> Till recently I was under the impression that the flag O_APPEND is
> used for opeing a file so that the file pointer will be at the end of
> the file.
> But it looks like it s actually a property of the the succeeding
> write() s, whereby whenever a write() is done of teh file descriptor,
> *atomically* the file pointer will be moved to the end of the file and
> the data appended. ( Not only is the seeking and writing atomic, but
> also will be will be done in an exclusively "locked" fashion, so that
> any other process writing to the same file, opened with O_APPEND, will
> not write over one another's data )

> This seems to be true even if some totally unrelated process has the
> file opened for writing.
> Sonsider the following sequence of events, in a multi processor
> machine.
> 1.process A opened teh file O_APPEND
> 2. Process B also opens with O_APPEND
> 3. Process A write() s
> 4. Process B also write() s

> Process B s data will be appended after that of process A.

> If this is true ( somebody correct me if I m wrong ), I m sure there
> are many people out there who shared this misconcept with me ( that
> O_APPEND is just to open a file so that the file pointer will be at
> the end of it after opening ).

> --sony

SUSv3 says this about open():

    O_APPEND
        If set, the file offset shall be set to the end of the
        file prior to each write.

This doesn't say anything about atomicity though.

For write(), SUSv3 says:

    If the O_APPEND flag of the file status flags is set, the
    file offset shall be set to the end of the file prior to
    each write and no intervening file modification operation
    shall occur between changing the file offset and the write
    operation.

This, however, sounds like atomicity is guaranteed (for moving
the file pointer and initiating writing at least).

--
Andreas K?h?ri
--------------------------------------------------------------
Stable, secure, portable, free:     www.netbsd.org

 
 
 

O_APPEND reinvented

Post by Marc Rochkin » Wed, 18 Sep 2002 12:55:44


I'm sure O_APPEND is a property of the file descriptor, not of the file. So
I don't see how your scenario (two processes, each with their own open)
could occur.


Quote:> Till recently I was under the impression that the flag O_APPEND is
> used for opeing a file so that the file pointer will be at the end of
> the file.
> But it looks like it s actually a property of the the succeeding
> write() s, whereby whenever a write() is done of teh file descriptor,
> *atomically* the file pointer will be moved to the end of the file and
> the data appended. ( Not only is the seeking and writing atomic, but
> also will be will be done in an exclusively "locked" fashion, so that
> any other process writing to the same file, opened with O_APPEND, will
> not write over one another's data )

> This seems to be true even if some totally unrelated process has the
> file opened for writing.
> Sonsider the following sequence of events, in a multi processor
> machine.
> 1.process A opened teh file O_APPEND
> 2. Process B also opens with O_APPEND
> 3. Process A write() s
> 4. Process B also write() s

> Process B s data will be appended after that of process A.

> If this is true ( somebody correct me if I m wrong ), I m sure there
> are many people out there who shared this misconcept with me ( that
> O_APPEND is just to open a file so that the file pointer will be at
> the end of it after opening ).

> --sony

 
 
 

O_APPEND reinvented

Post by Marc Rochkin » Wed, 18 Sep 2002 13:11:05


Sorry... misread the example. Since they are both opening with the flag,
they both get the behavior.
 
 
 

O_APPEND reinvented

Post by Sony E Anton » Wed, 18 Sep 2002 13:20:42


Both processes have opened the file in O_APPEND mode
seperately/independently. Yet the mode will ensure that they dont write
on each other's data. ( doing a fork() after teh file has already been
opened is the only other way to acheive this )

--sony


> I'm sure O_APPEND is a property of the file descriptor, not of the file. So
> I don't see how your scenario (two processes, each with their own open)
> could occur.



>>Till recently I was under the impression that the flag O_APPEND is
>>used for opeing a file so that the file pointer will be at the end of
>>the file.
>>But it looks like it s actually a property of the the succeeding
>>write() s, whereby whenever a write() is done of teh file descriptor,
>>*atomically* the file pointer will be moved to the end of the file and
>>the data appended. ( Not only is the seeking and writing atomic, but
>>also will be will be done in an exclusively "locked" fashion, so that
>>any other process writing to the same file, opened with O_APPEND, will
>>not write over one another's data )

>>This seems to be true even if some totally unrelated process has the
>>file opened for writing.
>>Sonsider the following sequence of events, in a multi processor
>>machine.
>>1.process A opened teh file O_APPEND
>>2. Process B also opens with O_APPEND
>>3. Process A write() s
>>4. Process B also write() s

>>Process B s data will be appended after that of process A.

>>If this is true ( somebody correct me if I m wrong ), I m sure there
>>are many people out there who shared this misconcept with me ( that
>>O_APPEND is just to open a file so that the file pointer will be at
>>the end of it after opening ).

>>--sony

 
 
 

O_APPEND reinvented

Post by Casper H.S. Di » Wed, 18 Sep 2002 18:37:16



>If this is true ( somebody correct me if I m wrong ), I m sure there
>are many people out there who shared this misconcept with me ( that
>O_APPEND is just to open a file so that the file pointer will be at
>the end of it after opening ).

Misconception caused by not reading the manual page, at least on Solaris
it has:

     O_APPEND
           If set, the file offset is set to the end of the  file
           prior to each write.

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

O_APPEND reinvented

Post by Sony Anto » Wed, 18 Sep 2002 23:29:53




> >If this is true ( somebody correct me if I m wrong ), I m sure there
> >are many people out there who shared this misconcept with me ( that
> >O_APPEND is just to open a file so that the file pointer will be at
> >the end of it after opening ).

> Misconception caused by not reading the manual page, at least on Solaris
> it has:

>      O_APPEND
>            If set, the file offset is set to the end of the  file
>            prior to each write.

Yes it was Solaris man page that prompted me to discover this. Till
then I had skipped this part of the man page, since I thought I "knew"
what O_APPEND does from my reading of Unix books.

Actually APUE discusses this specifically at 3.10. I had read this
book some 5 years back. It s scary that I couldn t even remember the
mention of it.

Hmm that also brings up some questions.
1. For a file that is opened with O_APPEND, what will be the behavior
if an lseek() is done so as to position the file offset at somewhere
in the middle, and then a write(). The moment you do an lseek() will
you lose the O_APPEND property of the file descriptor. ?

2. Also where is the attribute for O_APPEND stored, file table or in
teh inode/vnode table.

3. When 2 processes are opening the same file in O_APPEND, are they
sharing the vnode/inode.

4. To get teh atomic semantics of write() where does the kernel put
the lock.

--sony

--sony

Quote:

> Casper

 
 
 

O_APPEND reinvented

Post by Casper H.S. Di » Thu, 19 Sep 2002 00:24:47



>1. For a file that is opened with O_APPEND, what will be the behavior
>if an lseek() is done so as to position the file offset at somewhere
>in the middle, and then a write(). The moment you do an lseek() will
>you lose the O_APPEND property of the file descriptor. ?

The write will append at the end; you can lseek for reading.

Quote:>2. Also where is the attribute for O_APPEND stored, file table or in
>the inode/vnode table.

The open file table.

Quote:>3. When 2 processes are opening the same file in O_APPEND, are they
>sharing the vnode/inode.

Yes; a file only had a single inode/vnode; all processes using the file
share it.

A file can have multiple file pointers associated with it; but those
can be shared between fds and processes.

(i.e., multiple fds point to a single file pointer which keeps a file
offset, some flags such as FAPPEND [file opened w/ O_APPEND])

You can change the flag with fcntl().

Multiple file pointers can point to a single vnode/inode.

Quote:>4. To get teh atomic semantics of write() where does the kernel put
>the lock.

When writing the file itself (vnode) is locked though some concurrency
is posisble.

Casper
--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

 
 
 

O_APPEND reinvented

Post by Sony Anto » Thu, 19 Sep 2002 04:51:58


Quote:> Yes it was Solaris man page that prompted me to discover this. Till
> then I had skipped this part of the man page, since I thought I "knew"
> what O_APPEND does from my reading of Unix books.

> Actually APUE discusses this specifically at 3.10. I had read this
> book some 5 years back. It s scary that I couldn t even remember the
> mention of it.

> Hmm that also brings up some questions.
> 1. For a file that is opened with O_APPEND, what will be the behavior
> if an lseek() is done so as to position the file offset at somewhere
> in the middle, and then a write(). The moment you do an lseek() will
> you lose the O_APPEND property of the file descriptor. ?

tried this and it seems to be always writing to the end of the file in
solaris. In effect it completely ignored the lseek().
 couldn t find any documentation on this either.
--sony
 
 
 

O_APPEND reinvented

Post by Barry Margoli » Thu, 19 Sep 2002 06:03:27




>> Yes it was Solaris man page that prompted me to discover this. Till
>> then I had skipped this part of the man page, since I thought I "knew"
>> what O_APPEND does from my reading of Unix books.

>> Actually APUE discusses this specifically at 3.10. I had read this
>> book some 5 years back. It s scary that I couldn t even remember the
>> mention of it.

>> Hmm that also brings up some questions.
>> 1. For a file that is opened with O_APPEND, what will be the behavior
>> if an lseek() is done so as to position the file offset at somewhere
>> in the middle, and then a write(). The moment you do an lseek() will
>> you lose the O_APPEND property of the file descriptor. ?

>tried this and it seems to be always writing to the end of the file in
>solaris. In effect it completely ignored the lseek().
> couldn t find any documentation on this either.

The man page says that the file pointer is reset to the end "prior to each
write."  It doesn't say that lseek changes this, so I wouldn't expect it
to.  Does it need to list all the system calls that *don't* override this
explicitly?

--

Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

 
 
 

O_APPEND reinvented

Post by phil-news-nos.. » Thu, 19 Sep 2002 07:13:13



| Yes it was Solaris man page that prompted me to discover this. Till
| then I had skipped this part of the man page, since I thought I "knew"
| what O_APPEND does from my reading of Unix books.

This is an inherint danger of either reading something that appears to
be right but isn't, or reading something poorly written that gives an
incorrect understanding, or just plain not understanding it but thinking
that you do.  No easy way around it until you encounter something that
contradicts it, then dealing with it.

| Actually APUE discusses this specifically at 3.10. I had read this
| book some 5 years back. It s scary that I couldn t even remember the
| mention of it.

I be there are lots of things that, being not so important at that
time, might not have sunk in, and thus won't be remembered.  So when
friends ask to borrow your APUE, say "no, go buy your own".

| 2. Also where is the attribute for O_APPEND stored, file table or in
| teh inode/vnode table.

In whatever holds the descriptor state (implementation dependent I am
sure).

| 3. When 2 processes are opening the same file in O_APPEND, are they
| sharing the vnode/inode.

This is probably also implementation dependent.  But there is a distinction
between 2 separate opens, and duplicating a descriptor.  So there has to be
the ability to reference something multiple times that describes each open
file, which in turn references the file that is actually open.

| 4. To get teh atomic semantics of write() where does the kernel put
| the lock.

I'm sure that is implementation dependent.  It could be achieved in a single
processor by making sure interrupts are masked and just updating the current
position and queueing the write buffer.  A more elaborate system, like we
have today on most machines, might have to briefly place a lock on a file,
or queue the various writes to be handled by a single thread.  An OS I started
to write many years ago actually had a thread internal to it for each open
file, and I/O was done by sending messages to it and getting messages back.
Every active device and filesystem also had a thread.

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |

-----------------------------------------------------------------

 
 
 

O_APPEND reinvented

Post by Sony Anto » Thu, 19 Sep 2002 08:16:59


Quote:> >in the middle, and then a write(). The moment you do an lseek() will
> >you lose the O_APPEND property of the file descriptor. ?

> The write will append at the end; you can lseek for reading.

OK that s consistent with what I observed in Solaris.
But do you know if this is portable/posix mandated behavior.

Quote:

> >2. Also where is the attribute for O_APPEND stored, file table or in
> >the inode/vnode table.

> The open file table.

OK let me make sure that we are both using the same names for the
tables associated. My basis is AUPUE figure 3.2 ( page 58 ).

1. The table/array indexed by the file descriptor number I call "file
descriptor table".
2. A pointer from file descriptor table points to another table entry
is called "file table". this is where fileposition is stored.
3. A pointer from "file table" points to another table entry called
inode/vnode

So 3 tables

My understanding was that file descriptor table holds only some
special kind of flags like close_on_exec etc.

Quote:

> >3. When 2 processes are opening the same file in O_APPEND, are they
> >sharing the vnode/inode.

> Yes; a file only had a single inode/vnode; all processes using the file
> share it.

> A file can have multiple file pointers associated with it; but those
> can be shared between fds and processes.

OK your file pointer is my "file table" entry.

Quote:

> (i.e., multiple fds point to a single file pointer which keeps a file
> offset, some flags such as FAPPEND [file opened w/ O_APPEND])

OK I think I understand, if FAPPEND is sitting in a file table entry,
then write() will have to be calling a different method on teh vnode
whenever a write() is executed.

Actually looking at /usr/include/sys/vnode.h it shows that struct
vnodeops defined write() with some flags as arguments. So I guess
whenever a write is executed on a file with O_APPEND, vnodes wtite() (
a function pointer ) is called with the flag and with the vnode
locked. Later inside the kernel, this might be invoking eithe a
different routine - due to teh FAPPEND - or the same routine,
preceeded by something equivalent to an lseek() to teh end of the
file. Since the vnode is locked, everything is atomic and safe.

Now I feel better.

--sony

 
 
 

O_APPEND reinvented

Post by Sony E Anton » Thu, 19 Sep 2002 13:00:47





>>>Yes it was Solaris man page that prompted me to discover this. Till
>>>then I had skipped this part of the man page, since I thought I "knew"
>>>what O_APPEND does from my reading of Unix books.

>>>Actually APUE discusses this specifically at 3.10. I had read this
>>>book some 5 years back. It s scary that I couldn t even remember the
>>>mention of it.

>>>Hmm that also brings up some questions.
>>>1. For a file that is opened with O_APPEND, what will be the behavior
>>>if an lseek() is done so as to position the file offset at somewhere
>>>in the middle, and then a write(). The moment you do an lseek() will
>>>you lose the O_APPEND property of the file descriptor. ?

>>tried this and it seems to be always writing to the end of the file in
>>solaris. In effect it completely ignored the lseek().
>>couldn t find any documentation on this either.

> The man page says that the file pointer is reset to the end "prior to each
> write."  It doesn't say that lseek changes this, so I wouldn't expect it
> to.  Does it need to list all the system calls that *don't* override this
> explicitly?

No but it wouldn t hurt ( and will be quite appropriate IMHO ) to
mention it in the lseek() man page. ( I d expect it to read something
like "If teh file is opened with O_APPEND, then lseek will affect only
succeeding read()s and not write()s" )

--sony

- Show quoted text -

 
 
 

O_APPEND reinvented

Post by Andreas K?h?r » Thu, 19 Sep 2002 13:44:19


Submitted by "Sony E Antony" to comp.unix.programmer:


[cut]

>> The man page says that the file pointer is reset to the end "prior to each
>> write."  It doesn't say that lseek changes this, so I wouldn't expect it
>> to.  Does it need to list all the system calls that *don't* override this
>> explicitly?

> No but it wouldn t hurt ( and will be quite appropriate IMHO ) to
> mention it in the lseek() man page. ( I d expect it to read something
> like "If teh file is opened with O_APPEND, then lseek will affect only
> succeeding read()s and not write()s" )

Well, strictly speaking, the side effect of lseek() (i.e.
re-setting the file position) is not in any way subject to the
O_APPEND flag.  It's the write() that gets affected by the flag
and that's already discussed on the write() manual page.

--
Andreas K?h?ri
--------------------------------------------------------------
Stable, secure, portable, free:     www.netbsd.org

 
 
 

O_APPEND reinvented

Post by Mohun Biswa » Thu, 19 Sep 2002 22:46:26



> Well, strictly speaking, the side effect of lseek() (i.e.
> re-setting the file position) ...

Resetting the file position is a *side* effect of lseek???
 
 
 

1. correct stdio behavior for O_APPEND

I am confused about what constitutes a correct implementation of
append mode. This issue in undergoing some current discussion on
comp.lang.perl.misc, and I was hoping to get some clarification
here regarding how it is supposed to work from a C or Unix
standpoint.

I am using linux with glibc2, with a standard FILE buffer of 4KB:

  % cat try.c
  #include <stdio.h>
  #include <stdlib.h>

  int main(int argc, char *argv[])
  {
    FILE *dev_null = fopen("/dev/null", "a");
    unsigned long int n = strtoul(argv[1], NULL, 0);
    char *string = malloc(n);

    if (dev_null == NULL || n == 0 || string == NULL)
        return 1; /* initialization error */

    memset(string, 'a', n);
    fwrite(string, 1, n, dev_null);
    fclose(dev_null);
    return 0;
  }

  % gcc try.c -o try
  % strace -e trace=write ./try 8192
  write(3, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 8192) = 8192
  % strace -e trace=write ./try 8190
  write(3, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096) = 4096
  write(3, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4094) = 4094
  % strace -e trace=write ./try 8194
  write(3, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 8192) = 8192
  write(3, "aa", 2)                       = 2

I conclude from this that append mode writes are atomic for glibc's stdio
iff either

  1) the size of the data to write is less than 4K (one buffer)
  2) the size of the data is a multiple of 4K

Otherwise they require exactly two writes- the first one writes the
largest multiple of 4K that the data contains.

My question is- shouldn't appending to a file be an atomic
operation (i.e. only one system call to write is made) for *any*
data size? Is the glibc behavior "correct", or is there no specification
for what constitutes a correct implementation of append mode?

--
Joe Schaefer

2. promise drivers

3. file I/O, O_APPEND

4. HELP!!! Installation problem

5. atomicity of write with O_APPEND

6. where to find PPC MoBos?

7. O_APPEND

8. Problems compiling dsniff under redhat7.1 (2.4.2-2 version)

9. 2.6 O_APPEND/truncate bug ?

10. Linux, PAP, ISDN, DNAI, and reinventing the wheel...

11. HELP: am I reinventing the wheel?

12. Reinventing the wheel?

13. Reinventing the wheel