Zero-copy IO

Zero-copy IO

Post by East-Wes » Fri, 23 Nov 2001 18:01:21



Does the current linux kernel support zero-copy IO interface?
Or does any one know if FreeBSD supports such interface?
Thanks.

John

 
 
 

Zero-copy IO

Post by Dave Pla » Sat, 24 Nov 2001 09:31:54


Quote:>Does the current linux kernel support zero-copy IO interface?

The 2.4 Linux kernels support "raw" disk I/O.  You bind a "raw"
virtual device to a real hard-drive (or partition), then open the raw
device and do I/O to it.  The I/O will be done directly between your
buffer and the hard drive interface, with no copying of the data to
the kernel's buffer page pool.  There are some significant
restrictions on the size and layout of the buffers you use (keep 'em
on page boundaries, with each buffer lying in a set of pages used for
no other purpose, and you should be OK).

--

Visit the Jade Warrior home page:  http://www.radagast.org/jade-warrior/
  I do _not_ wish to receive unsolicited commercial email, and I will
     boycott any company which has the gall to send me such ads!

 
 
 

Zero-copy IO

Post by John S. Dyso » Sat, 24 Nov 2001 14:02:16



Quote:> >Does the current linux kernel support zero-copy IO interface?

> The 2.4 Linux kernels support "raw" disk I/O.  You bind a "raw"
> virtual device to a real hard-drive (or partition), then open the raw
> device and do I/O to it.

Almost any commercial quality UNIX has supported Raw disk I/O
forever.   This was STRONGLY desired by database manufacturers
like Oracle.

All of the BSD's and most of the commercial UNIXes have supported
RAW I/O, and is sort-of a 'given' for any UNIX.   I suspect that
the zero copy I/O question was meant for a general purpose interface.

John

 
 
 

Zero-copy IO

Post by East-Wes » Sat, 24 Nov 2001 19:39:14




>> >Does the current linux kernel support zero-copy IO interface?

>> The 2.4 Linux kernels support "raw" disk I/O.  You bind a "raw"
>> virtual device to a real hard-drive (or partition), then open the raw
>> device and do I/O to it.

>Almost any commercial quality UNIX has supported Raw disk I/O
>forever.   This was STRONGLY desired by database manufacturers
>like Oracle.

>All of the BSD's and most of the commercial UNIXes have supported
>RAW I/O, and is sort-of a 'given' for any UNIX.   I suspect that
>the zero copy I/O question was meant for a general purpose interface.

>John

Yes in fact I was wondering if the network socket interface supports
zero-copy IO semantics transparent to the calling process.
For instance, if a user process wants to send a buffer of data over
the network, do the interfaces like sendfile(socket,file) or
send(socket) automatically try to enforce zero-copy by doing page
remapping or some other technique?

Thanks.

John

 
 
 

Zero-copy IO

Post by Dave Pla » Sun, 25 Nov 2001 04:38:26


Quote:>Yes in fact I was wondering if the network socket interface supports
>zero-copy IO semantics transparent to the calling process.
>For instance, if a user process wants to send a buffer of data over
>the network, do the interfaces like sendfile(socket,file) or
>send(socket) automatically try to enforce zero-copy by doing page
>remapping or some other technique?

Nope.  The data is copied down into the kernel during send, and copied
back into user-space during receive.

Trying to do zero-copy network I/O is exceedingly tricky, and I
suspect that it's not terribly productive on most of today's
mainstream hardware.  

There are several "gotchas" if you try to do zero-copy sends.  For one
thing, in order to convert a user buffer into a transmissible packet
you have to add the TCP/IP headers and other stuff - in order to do
this without having to copy the user data elsewhere, you need to be
able to transmit a packet via a scatter-gather technique (i.e.
transmitting the TCP/IP header from kernel memory, and the payload
from the user buffer).  Very few Ethernet adapters have this sort of
scatter-gather capability in their DMA engines (many can
scatter-gather on a per-packet basis, but I don't know of any which
can scatter-gather _within_ a single Ethernet packet).  You could use
an Ethernet adapter which operates in PIO mode, but then you're using
the CPU to copy/write data out to the adapter chip... rather slow.

For another thing - you have to keep the user from modifying the
buffer while you're trying to send it.  You can, of course, mark the
page as non-modificable during this process, and force the user
process to take a page-fault if it tries to write data into the same
area.  If this happens, you must either suspend the user process
(hurts responsiveness and throughput) or use the "copy on write"
approach to give the user process a modifiable copy of the page to
play with... and this kills the whole "zero-copy" idea right there.
Or, you can require network writes to be strictly synchronous - the
write/send call doesn't return until the data has been transmitted
(UDP) or transmitted and acknowledged (TCP).  This is workable, but
probably pretty slow in most cases.

I know that there are some embedded/RTOS systems with zero-copy TCP
stacks.  I suspect that these are most beneficial in low-memory-
footprint applications with slow CPUs.

--

Visit the Jade Warrior home page:  http://www.radagast.org/jade-warrior/
  I do _not_ wish to receive unsolicited commercial email, and I will
     boycott any company which has the gall to send me such ads!

 
 
 

Zero-copy IO

Post by Gianni Marian » Sun, 25 Nov 2001 17:09:23


I am very interested in zero copy networking IO.  In my case, we are
developing a streaming server application.

With a 25kbit stream, I suspect a machine using a single gigabit nic
can push around 30,000 streams (theoretical based on a max measured
ideal of 750MB/s measured througput).  We've found that eliminating
copies in the streamer itself is a big win (no surprise)
and I suspect that using zero copy in the kernel will probably be
a big help also.

My only point is that, there are applications pushing against the
copy data limitation and it probably won't get better anytime soon.


>>Yes in fact I was wondering if the network socket interface supports
>>zero-copy IO semantics transparent to the calling process.
>>For instance, if a user process wants to send a buffer of data over
>>the network, do the interfaces like sendfile(socket,file) or
>>send(socket) automatically try to enforce zero-copy by doing page
>>remapping or some other technique?

> Nope.  The data is copied down into the kernel during send, and copied
> back into user-space during receive.

> Trying to do zero-copy network I/O is exceedingly tricky, and I
> suspect that it's not terribly productive on most of today's
> mainstream hardware.  

> There are several "gotchas" if you try to do zero-copy sends.  For one
> thing, in order to convert a user buffer into a transmissible packet
> you have to add the TCP/IP headers and other stuff - in order to do
> this without having to copy the user data elsewhere, you need to be
> able to transmit a packet via a scatter-gather technique (i.e.
> transmitting the TCP/IP header from kernel memory, and the payload
> from the user buffer).  Very few Ethernet adapters have this sort of
> scatter-gather capability in their DMA engines (many can
> scatter-gather on a per-packet basis, but I don't know of any which
> can scatter-gather _within_ a single Ethernet packet).  You could use
> an Ethernet adapter which operates in PIO mode, but then you're using
> the CPU to copy/write data out to the adapter chip... rather slow.

> For another thing - you have to keep the user from modifying the
> buffer while you're trying to send it.  You can, of course, mark the
> page as non-modificable during this process, and force the user
> process to take a page-fault if it tries to write data into the same
> area.  If this happens, you must either suspend the user process
> (hurts responsiveness and throughput) or use the "copy on write"
> approach to give the user process a modifiable copy of the page to
> play with... and this kills the whole "zero-copy" idea right there.
> Or, you can require network writes to be strictly synchronous - the
> write/send call doesn't return until the data has been transmitted
> (UDP) or transmitted and acknowledged (TCP).  This is workable, but
> probably pretty slow in most cases.

> I know that there are some embedded/RTOS systems with zero-copy TCP
> stacks.  I suspect that these are most beneficial in low-memory-
> footprint applications with slow CPUs.

 
 
 

Zero-copy IO

Post by Philip Armstro » Tue, 27 Nov 2001 04:26:38




>>Yes in fact I was wondering if the network socket interface supports
>>zero-copy IO semantics transparent to the calling process.
>>For instance, if a user process wants to send a buffer of data over
>>the network, do the interfaces like sendfile(socket,file) or
>>send(socket) automatically try to enforce zero-copy by doing page
>>remapping or some other technique?

>Nope.  The data is copied down into the kernel during send, and copied
>back into user-space during receive.

>Trying to do zero-copy network I/O is exceedingly tricky, and I
>suspect that it's not terribly productive on most of today's
>mainstream hardware.  

Sheesh. I wish some of you people would go search the linux-kernel
archives before posting here.

There's a whole set of threads on the zero-copy networking
implementation in linux. I suggest you go read them.

Phil

(Try the links from http://kt.linuxcare.com/kernel-traffic/latest.epl
for starters)

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt

 
 
 

Zero-copy IO

Post by East-Wes » Tue, 27 Nov 2001 14:37:05




>>>Yes in fact I was wondering if the network socket interface supports
>>>zero-copy IO semantics transparent to the calling process.
>>>For instance, if a user process wants to send a buffer of data over
>>>the network, do the interfaces like sendfile(socket,file) or
>>>send(socket) automatically try to enforce zero-copy by doing page
>>>remapping or some other technique?

>>Nope.  The data is copied down into the kernel during send, and copied
>>back into user-space during receive.

>>Trying to do zero-copy network I/O is exceedingly tricky, and I
>>suspect that it's not terribly productive on most of today's
>>mainstream hardware.  

>Sheesh. I wish some of you people would go search the linux-kernel
>archives before posting here.

>There's a whole set of threads on the zero-copy networking
>implementation in linux. I suggest you go read them.

>Phil

>(Try the links from http://kt.linuxcare.com/kernel-traffic/latest.epl
>for starters)

Umm. Does sendfile use zero-copy? I know FreeBSD version supports
zero-copy for senfile. Thanks.

John

 
 
 

Zero-copy IO

Post by Philip Armstro » Tue, 27 Nov 2001 17:58:51




>I wrote (Fix the quoting in your newsreader!)
>>There's a whole set of threads on the zero-copy networking
>>implementation in linux. I suggest you go read them.

>>Phil

>>(Try the links from http://kt.linuxcare.com/kernel-traffic/latest.epl
>>for starters)

>Umm. Does sendfile use zero-copy? I know FreeBSD version supports
>zero-copy for senfile. Thanks.

Yes, if your networking hardware is capable of hardware checksumming
and scatter-gather DMA.

Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt

 
 
 

Zero-copy IO

Post by el.. » Wed, 28 Nov 2001 08:49:57




>>Yes in fact I was wondering if the network socket interface supports
>>zero-copy IO semantics transparent to the calling process.
>>For instance, if a user process wants to send a buffer of data over
>>the network, do the interfaces like sendfile(socket,file) or
>>send(socket) automatically try to enforce zero-copy by doing page
>>remapping or some other technique?
>Nope.  The data is copied down into the kernel during send, and copied
>back into user-space during receive.

Have you bothered looking lately?

--
http://www.spinics.net/linux/

 
 
 

Zero-copy IO

Post by Dave Pla » Wed, 28 Nov 2001 09:32:30



>>>Yes in fact I was wondering if the network socket interface supports
>>>zero-copy IO semantics transparent to the calling process.
>>>For instance, if a user process wants to send a buffer of data over
>>>the network, do the interfaces like sendfile(socket,file) or
>>>send(socket) automatically try to enforce zero-copy by doing page
>>>remapping or some other technique?

>>Nope.  The data is copied down into the kernel during send, and copied
>>back into user-space during receive.

>Have you bothered looking lately?

2.4.13 is the most recent I've looked at in detail.  Both UDP and TCP
are pretty clearly copying the data from user-space into the sk_buff
structures during a transmit operation.  

In UDP, udp_sendmsg() calls into udp_getfrag() and
udp_getfrag_nosum(); the former calls csum_partial_copy_from_iovecend
and the latter calls memcpy_from_iovecend.

In TCP, tcp_sendmsg() calls tcp_copy_to_page and skb_add_data, both of
which call csum_and_copy_from_user.

The copy calls on the receive side are equally easy to find.  During
receive, I don't see that there's much possibility for zero-copy,
since the packets are often being DMA'ed into memory by the
network-adapter hardware, and there's no way for the network stack to
"point" the hardware DMA engine at the proper user's buffer since
there's no way of knowing in advance what sort of packet will arrive
next.

--

Visit the Jade Warrior home page:  http://www.radagast.org/jade-warrior/
  I do _not_ wish to receive unsolicited commercial email, and I will
     boycott any company which has the gall to send me such ads!