Summary: "Disk Write Path"

Summary: "Disk Write Path"

Post by Zh » Sun, 29 Oct 1995 04:00:00



A while ago I posted the above question, and here is a summary of the
response I got.  After looking at the kernel code, it seems Solaris
UFS implementation actually does I/O clustering, and hands off that
cluster request (max 56KB) to the disk controller.  I am not sure how
the disk controller handles this request (e.g., whether it chops up
the request into smaller serviceable units), but I believe the
controller can service this large chunk at ONE time.

Thanks very much for those who helped.  Any followups are welcome!

Cheers!

- Jay Li
PhD Candidate
Columbia University

========================================================================
19-Oct-95  1:16:00-GMT,1359;000000000001
Received: from ground.cs.columbia.edu (ground.cs.columbia.edu [128.59.10.3]) by ober.cs.columbia.edu (8.6.12/8.6.6) with ESMTP id TAA10917 for <l...@ober.cs.columbia.edu>; Wed, 18 Oct 1995 19:16:14 -0400
Received: (from li@localhost) by ground.cs.columbia.edu (8.6.12/8.6.6) id TAA00120; Wed, 18 Oct 1995 19:16:11 -0400
Date: Wed, 18 Oct 95 19:16:11 EDT
From: "Jay (Zhe) Li" <l...@ober.cs.columbia.edu>
To: comp.unix.intern...@cs.columbia.edu
Cc: comp.unix.sola...@cs.columbia.edu
Subject: Disk Write Path
Message-ID: <CMM.0.90.2.814058171.li@ground.cs.columbia.edu>
Content-Type: text
Content-Length: 755

Could anyone shed some lights on how a logical write request is handled by
the kernel, disk driver and disk controller?  Let's say the write request is
of the form "write(fd, buf, 64KB)".   Would this write be broken down into
smaller units (say 8KB file system block size), then handed directly from the
kernel to the disk controller in terms of 8 requests?  Would the disk
controller further chopps up the 8KB request (if this is true) further into
512B (sector size) and do a seek/rotation for each sector write?

The platform concerned is Solaris 2.4 and SCSI-2 device.

Thanks for the help!  Please email to l...@cs.columbia.edu and I will summarize
if there are enough interests.

- Jay Li
Phd Candidate
Dept. of Computer Science
Columbia University

19-Oct-95 10:20:00-GMT,2387;000000000011
Received: from lol.cs.columbia.edu (lol.cs.columbia.edu [128.59.10.14]) by ober.cs.columbia.edu (8.6.12/8.6.6) with ESMTP id EAA13518 for <l...@ober.cs.columbia.edu>; Thu, 19 Oct 1995 04:19:35 -0400
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by lol.cs.columbia.edu (8.6.12/8.6.6) with ESMTP id EAA06890 for <l...@news.cs.columbia.edu>; Thu, 19 Oct 1995 04:19:34 -0400
Received: from snail.Sun.COM by mercury.Sun.COM (Sun.COM)
        id BAA19850; Thu, 19 Oct 1995 01:19:02 -0700
Received: from Russia.Sun.COM (sunruss.Russia.Sun.COM) by snail.Sun.COM (4.1/SMI-4.1)
        id AA16275; Thu, 19 Oct 95 01:18:56 PDT
Received: from kremlin.russia.sun.com by Russia.Sun.COM (4.1/SMI-4.1e)
        id AA09439; Thu, 19 Oct 95 11:18:00 MSK
Received: from rast.russia.sun.com by kremlin.russia.sun.com (5.0/SMI-SVR4)
        id AA11278; Thu, 19 Oct 1995 11:08:45 --300
Received: by rast.russia.sun.com (5.0/SMI-SVR4)
        id AA01677; Thu, 19 Oct 1995 11:08:44 +0300
Date: Thu, 19 Oct 1995 11:08:44 +0300
From: Andrei.Ryz...@Russia.Sun.COM (Andrei Ryzhov CS Tech Supp Engr)
Message-Id: <9510190808.AA01...@rast.russia.sun.com>
To: l...@ober.cs.columbia.edu
Subject: Re: Disk Write Path
Content-Type: text
Content-Length: 1188

In article l...@ground.cs.columbia.edu, l...@news.cs.columbia.edu ("Jay (Zhe) Li") writes:

>Could anyone shed some lights on how a logical write request is handled by
>the kernel, disk driver and disk controller?  Let's say the write request is
>of the form "write(fd, buf, 64KB)".   Would this write be broken down into
>smaller units (say 8KB file system block size), then handed directly from the
>kernel to the disk controller in terms of 8 requests?  Would the disk
>controller further chopps up the 8KB request (if this is true) further into
>512B (sector size) and do a seek/rotation for each sector write?

>The platform concerned is Solaris 2.4 and SCSI-2 device.

This depends on what file type is fd.
If fd refers to a raw (character-oriented special device),
the system will try to write a buf at once.

For the generic files (inside the filesystem),
the but will be probably placed in the in-RAM buffer cache
and written to the disk when the OS decides to do so,
or when you issue a sync command/syscall

        Andrei S. Ryzhov                     Andrei.Ryz...@Russia.Sun.Com
        Tech. Supp. Engr.
          Sun Service
            Russia                 I do not speak for any body but myself

19-Oct-95 13:29:00-GMT,1863;000000000001
Received: from cs.columbia.edu (cs.columbia.edu [128.59.10.13]) by ober.cs.columbia.edu (8.6.12/8.6.6) with ESMTP id HAA14554 for <l...@ober.cs.columbia.edu>; Thu, 19 Oct 1995 07:28:07 -0400
Received: from delta.eecs.nwu.edu (delta.eecs.nwu.edu [129.105.5.103]) by cs.columbia.edu (8.6.12/8.6.6) with ESMTP id HAA18168 for <l...@cs.columbia.edu>; Thu, 19 Oct 1995 07:28:01 -0400
Received: by delta.eecs.nwu.edu (8.6.12/8.6.12) id GAA21194; Thu, 19 Oct 1995 06:27:50 -0500
Date: Thu, 19 Oct 1995 06:27:50 -0500
From: Robert Bonomi <bon...@delta.eecs.nwu.edu>
Message-Id: <199510191127.GAA21...@delta.eecs.nwu.edu>
To: l...@ober.cs.columbia.edu
Subject: Re: Disk Write Path
Newsgroups: comp.unix.solaris,comp.unix.internals
In-Reply-To: <CMM.0.90.2.814058171...@ground.cs.columbia.edu>
Organization: EE/CS Department, Northwestern University, Evanston, IL.
Content-Type: text
Content-Length: 973

In article <CMM.0.90.2.814058171...@ground.cs.columbia.edu> you write:

>Could anyone shed some lights on how a logical write request is handled by
>the kernel, disk driver and disk controller?  Let's say the write request is
>of the form "write(fd, buf, 64KB)".   Would this write be broken down into
>smaller units (say 8KB file system block size), then handed directly from the
>kernel to the disk controller in terms of 8 requests?  Would the disk
>controller further chopps up the 8KB request (if this is true) further into
>512B (sector size) and do a seek/rotation for each sector write?

>The platform concerned is Solaris 2.4 and SCSI-2 device.

>Thanks for the help!  Please email to l...@cs.columbia.edu and I will summarize
>if there are enough interests.

>- Jay Li
>Phd Candidate
>Dept. of Computer Science
>Columbia University

does the controller support scatter-read/gather-write ??  if so, it's
probably passed to the controller as one 'linked' request

19-Oct-95 18:51:00-GMT,2281;000000000001
Received: from cs.columbia.edu (cs.columbia.edu [128.59.10.13]) by ober.cs.columbia.edu (8.6.12/8.6.6) with ESMTP id MAA16844 for <l...@ober.cs.columbia.edu>; Thu, 19 Oct 1995 12:51:52 -0400
Received: from plaza.ds.adp.com (lockbox.plaza.ds.adp.com [139.126.34.128]) by cs.columbia.edu (8.6.12/8.6.6) with SMTP id MAA28024 for <l...@cs.columbia.edu>; Thu, 19 Oct 1995 12:51:49 -0400
Received: from myst.plaza.ds.adp.com by plaza.ds.adp.com (4.1/3.1.012693-Automatic Data Processing Dealer Services);
        id AA10998 for l...@cs.columbia.edu; Thu, 19 Oct 95 09:51:16 PDT
Received: from adpplz (adpplz.plaza.ds.adp.com [139.126.60.101]) by myst.plaza.ds.adp.com (8.6.9/8.6.9) with SMTP id JAA00839 for <l...@cs.columbia.edu>; Thu, 19 Oct 1995 09:53:01 -0700
Received: by adpplz (Automatic Data Processing Dealer Services/1.1)
        id AA25715; Thu, 19 Oct 95 16:51:41 GMT
From: ter...@plaza.ds.adp.com
Date: Thu, 19 Oct 95 16:51:41 GMT
Message-Id: <9510191651.AA25715@adpplz>
To: l...@ober.cs.columbia.edu
Subject: Re: Disk Write Path
Newsgroups: comp.unix.solaris,comp.unix.internals
In-Reply-To: <CMM.0.90.2.814058171...@ground.cs.columbia.edu>
Organization: ADP Dealer Services
Content-Type: text
Content-Length: 1072

In article <CMM.0.90.2.814058171...@ground.cs.columbia.edu> you write:

>Could anyone shed some lights on how a logical write request is handled by
>the kernel, disk driver and disk controller?  Let's say the write request is
>of the form "write(fd, buf, 64KB)".   Would this write be broken down into
>smaller units (say 8KB file system block size), then handed directly from the
>kernel to the disk controller in terms of 8 requests?  Would the disk
>controller further chopps up the 8KB request (if this is true) further into
>512B (sector size) and do a seek/rotation for each sector write?

   Yes, to your second question, no to your third question, but ONLY if you are
running some sort of UFS (or BSD FFS).

+--------------------------------------------------------------------+
|  "Charlie don't surf!!!!!"   Robert Duvall as     Terry Laskodi    |
|  "I love the smell of na-   Colonel Kilgore in         at          |
|   palm in the morning..."    "Apocalypse Now"     ADP Portland, OR |
+--------------------------------------------------------------------+

21-Oct-95  5:11:00-GMT,2274;000000000001
Received: from cs.columbia.edu (cs.columbia.edu [128.59.10.13]) by ober.cs.columbia.edu (8.6.12/8.6.6) with ESMTP id XAA00106 for <l...@ober.cs.columbia.edu>; Fri, 20 Oct 1995 23:11:12 -0400
Received: from neurocog.lrdc.pitt.edu (h...@neurocog.lrdc.pitt.edu [136.142.93.53]) by cs.columbia.edu (8.6.12/8.6.6) with ESMTP id XAA26414 for <l...@cs.columbia.edu>; Fri, 20 Oct 1995 23:11:11 -0400
Received: by neurocog.lrdc.pitt.edu
        (1.37.109.10G/16.2) id AA167145071; Fri, 20 Oct 1995 23:11:11 -0400
Message-Id: <199510210311.XAA26...@cs.columbia.edu>
Date: Fri, 20 Oct 1995 23:11:11 -0400
From: Mark Hahn <h...@neurocog.lrdc.pitt.edu>
To: l...@ober.cs.columbia.edu
Subject: Re: Disk Write Path
Newsgroups: comp.unix.solaris,comp.unix.internals
Organization: Learning Research and Development Center at U. of Pittsburgh
X-Newsreader: TIN [version 1.2 PL2]
Content-Type: text
Content-Length: 1385

> Could anyone shed some lights on how a logical write request is handled by
> the kernel, disk driver and disk controller?  Let's say the write request is

it's different for every kernel, of course.

> of the form "write(fd, buf, 64KB)".   Would this write be broken down into
> smaller units (say 8KB file system block size), then handed directly from the
> kernel to the disk controller in terms of 8 requests?  Would the disk

sort of.  have you read enough about Unix to know about the buffer-cache?
if not, do so!  anyway, the 64k eventually winds up on 8-10 buffers,
which are of filesystem block-size.  those do get handed to the driver.

> controller further chopps up the 8KB request (if this is true) further into
> 512B (sector size) and do a seek/rotation for each sector write?

it certainly tries to avoid seeking for each sector, though that's really
the responsibility of the filesystem (high-level layout routines.)  most
Unix systems use the berkeley fast filesystem, and there's a maxcontig
parameter to it.  though to be honest, I haven't found it to make much
difference.  note also that most of BFFS's cleverness is completely obsolete,
since disks have quite large caches (and are not laid out in fixed-size
cylinders either.)

regards, mark hahn.
--
operator may differ from spokesperson.  h...@neurocog.lrdc.pitt.edu
                                        http://neurocog.lrdc.pitt.edu/~hahn/