Is the order of write() calls maintained?

Is the order of write() calls maintained?

Post by Michael Wo » Wed, 19 Apr 1995 04:00:00



Hello everyone.

I'm looking into filesystem functionality to assess database reliablity.

One of the questions I can't find the answer to is this:  If a process
does five write() calls, will the writes always hit in the disk in the
original order?  If they don't hit the disk in the original order, is this
due to reordering in the kernel, reordering by some types of storage
devices, or both?

I've heard that the BSD filesystem code may reorder writes to minimize
the disk head movement.  I've also heard that some SCSI disks will do
this.  Can anyone substantiate these tales, or point me to documentation
on this subject?  Finally, short of turning on synchronous writes, are
there standard ways of dealing with the problem?

Michael Wolf

 
 
 

Is the order of write() calls maintained?

Post by Demetri Mourat » Thu, 20 Apr 1995 04:00:00


| One of the questions I can't find the answer to is this:  If a process
| does five write() calls, will the writes always hit in the disk in the
| original order?

For ordinary write() calls to files in the filesystem, not neccessarily.
When you make the call to write(), all that happens immediately is
a transfer of bytes from your process' userspace to kernel address space.
When the syncd calls sync(), blocks in the buffercache (block buffer)
get written out to disk, and I must assume that some optimizion is
done.

If you open the raw disk device and do a write(), typically the
kernel locks down those pages so the VMM does not reclaim them and
then initiates a direct transfer of data from those physical pages
to the disk.  In that case, writes will be done in order.

--

The Northern Trust Company              Voice:  +1 312 630-0735
Chicago, IL  60675                      FAX:    +1 312 630-6797

"The mere act of drinking beer in an attempt to measure your tolerance
 is likely to affect your impression of how many beers you've drunk."
        -- The Heineken uncertainty principle.

 
 
 

Is the order of write() calls maintained?

Post by Roger Bin » Fri, 21 Apr 1995 04:00:00


: One of the questions I can't find the answer to is this:  If a process
: does five write() calls, will the writes always hit in the disk in the
: original order?

They will hit the disk in an order the OS deems suitable for efficient and
reliable operation.  There are two things you can do to force more explicit
behaviour:

- Call fsync() to force all data for a file to be written to disk
- Memory map (mmap) files and use the various related flags and system
  calls to force writing

On Motorola SVR4, the man page for fsync has this note which suitably sums
up the behaviour:

     The way the data reach the physical medium depends  on  both
     implementation  and hardware.  fsync returns when the device
     driver tells it that the write has taken place.

Roger
--
 __  __ __  __                    
|  |\  /  /|  | Roger Binns       | `If you don't watch the *, you will
|  | \/  / |  | Software Engineer |  never become desensitized to it'
|  | /  /\ |  | IXI Ltd           |          - Bart Simpson at the cinema

 
 
 

Is the order of write() calls maintained?

Post by Neal P. Murp » Fri, 21 Apr 1995 04:00:00




>: One of the questions I can't find the answer to is this:  If a process
>: does five write() calls, will the writes always hit in the disk in the
>: original order?
>They will hit the disk in an order the OS deems suitable for efficient and
>reliable operation.  There are two things you can do to force more explicit
>behaviour:
>- Call fsync() to force all data for a file to be written to disk
>- Memory map (mmap) files and use the various related flags and system
>  calls to force writing
>On Motorola SVR4, the man page for fsync has this note which suitably sums
>up the behaviour:
>     The way the data reach the physical medium depends  on  both
>     implementation  and hardware.  fsync returns when the device
>     driver tells it that the write has taken place.

'man 2 open' says that the O_SYNC flag forces the write to not return until
the data have actually reached the medium and specific structures have been
updated. This would also guarantee that write()s are done in order.

Fester

 
 
 

Is the order of write() calls maintained?

Post by Guy Harr » Fri, 21 Apr 1995 04:00:00



>They will hit the disk in an order the OS deems suitable for efficient and
>reliable operation.  There are two things you can do to force more explicit
>behaviour:

Well, three, actually; the third alternative is to open the file with
the O_SYNC flag, or use "fcntl()" to set the O_SYNC flag, if your OS
supports the O_SYNC flag.

(Not all flavors of UNIX do, but then not all of them support "fsync()"
either, although I think all *modern* flavors of UNIX support at least
one of O_SYNC or "fsync()", and more modern ones support both.

"fsync()" was introduced in 4.2BSD, so any system that has a BSD-derived
file system - as the original poster hinted his system did - *probably*
has "fsync()".  Non-BSD-derived systems might or might not have it; SVR4
does, as I remember, but I don't think SVR3.x from AT&T did.

O_SYNC first showed up in System V Release 3.1 or 3.2 or so, as I
remember, so not all "BSD-flavored" systems support it.  SunOS 4.x,
although generally considered "BSD-flavored", *does* support O_SYNC;
however, 4.4-Lite doesn't appear to support it - it has an O_FSYNC flag,
but doesn't appear to use it, either under that name or under the name
FFSYNC.

Not flavors of UNIX of them support "mmap()", either.

There may also exist flavors of UNIX that don't support any of the
above but that have some *other* way of getting the desired behavior.)

 
 
 

Is the order of write() calls maintained?

Post by Guy Harr » Fri, 21 Apr 1995 04:00:00



>Not flavors of UNIX of them support "mmap()", either.

Yeah, but he *meant* "not all flavors of UNIX support 'mmap()', either."
 
 
 

Is the order of write() calls maintained?

Post by Vincent Flemi » Wed, 26 Apr 1995 04:00:00


: Hello everyone.

: I'm looking into filesystem functionality to assess database reliablity.

: One of the questions I can't find the answer to is this:  If a process
: does five write() calls, will the writes always hit in the disk in the
: original order?  If they don't hit the disk in the original order, is this
: due to reordering in the kernel, reordering by some types of storage
: devices, or both?

: I've heard that the BSD filesystem code may reorder writes to minimize
: the disk head movement.  I've also heard that some SCSI disks will do
: this.  Can anyone substantiate these tales, or point me to documentation
: on this subject?  Finally, short of turning on synchronous writes, are
: there standard ways of dealing with the problem?

You've heard correctly; how it's done depends largely upon the version
of the OS and the hardware.

In any case, filesystem writes from applications are asynchronous by
nature - the write() call *will* return before the data block hits the
disk.

So, if you're not running a commercial database that uses raw disk slices
instead of using a filesystem, you had better use synchronous writes.

Vince

--
Vincent Fleming, Senior Systems Engineer              USPA Lic. No. C-21980

ECCS, Inc., 1 Sheila Drive, Tinton Falls, NJ 07724    tele:   800-ECCS-INC
Makers Of High Performance RAID Products - AT&T, Sun, & HP VAR