Hi,
I have observed something that I can not understand and wish some guru out
there can explain it to me.
I wrote a pair of small programs to measure the transmission delay over the
network. One of the program is called "server" and the other is called
"client". The "server" simply echos whatever it receives from "client".
The communication is done through sockets. The programs exchange a message
of predefined length for predefined number of times and the time measurements
are taken from "client". The simple protocol, to transmit variable length
message, is as follows: for each message, it has a 16 byte header which
contains the length of the message body; therefore, for each write,
using write(), it writes a 16 byte header first then writes the body;
for each read, using read(), it reads the header first then, according to
the length specified in the header, it reads enough bytes for the body.
Sometimes it needs more than one read() to get a large message body.
This simple protocol works fine and is robust.
The timing measurements are u_time, s_time, which are taken from getrusage(),
clock, which is taken from clock(), and wall time, which is taken from
time(). All the timers are reset before the test loop and are measured right
after the end of that loop. The experiment is trying to measure the delay
with different message length.
The strange thing that I observed is,
When the message length is small, the communication takes a long time.
Though it reports small u_time, s_time and clock time, it does reports
huge wall time. (I can really feel it by waiting in front of the console).
It seems that the process is waiting something in the kernel.
When the message length is larger, it reports larger u_time, s_time and
clock time, as expected, yet the wall time remains roughly the same.
However, when the length is large to a point, while the u_time, s_time and
clock time are still increasing, the wall time drops significantly.
The magic number is around "1460" bytes. That is, based on the wall time,
to finish the test with 1460 byte message takes only 1/20 of the time needed
by the test with 16 or 32 or .. or 1459 byte message. Even with 8k byte
messages, it take 1/5 of the time needed by the test with 16 byte message.
I repeated the experiments many times on different machines, including
SPARCstation 1, SPARC SLC, SPARC ELC, SPARCstation 2 with SunOS 4.1, and
Encore with Umax 4.3. They all report the same phenomenon and the magic
number is always around 1460 (well, I did get 2900 once). However, I ported
the programs on Macintoshes, the communication between Macintoshes does not
have such phenomenon, yet the communication between Mac and Unix boxes does.
My question is:
Do I really hit any magic of socket/ethernet/UNIX/SunOS/read()/write()/*?
Any hint will be greatly appreciated.
=== One of the results (10000 iterations, round trip delay in micro seconds)
Msg_length u_time s_time clock_time wall_time (sec)
---------------------------------------------------------------------------
16 350000 3000000 3349866 2008
32 340000 3450000 3799848 2133
64 340000 3110000 3449862 2004
128 440000 3290000 3733184 2006
256 300000 3060000 3366532 2132
512 280000 2890000 3166540 2036
1024 270000 3140000 3399864 2002
2048 1500000 25980000 27482234 112
4096 1700000 44070000 45781502 172
8192 2800000 100580000 103379198 400