Packet Loss on Loopback Device in Case of SO_DONTROUTE

Packet Loss on Loopback Device in Case of SO_DONTROUTE

Post by Andreas Borche » Sat, 14 Dec 2002 02:53:38



[Please note the f'up to comp.unix.solaris]

I try to get Maple 8 running under Solaris 9. Two processes of Maple 8,
cmaple and mserver, setup a TCP connection between them.  This, however,
does not seem to work due to a weird package loss on the loopback
device lo0. Here are some excerpts of the truss output documenting the packet
loss along with my comments. The full truss log can be found at
   http://www.mathematik.uni-ulm.de/sai/borchert/maple.truss
and the complete lsof listings are available under
   http://www.mathematik.uni-ulm.de/sai/borchert/2580.lsof and
   http://www.mathematik.uni-ulm.de/sai/borchert/2582.lsof

mserver (2582) creates a socket, binds it to port 50575 using the wildcard
address, and listens to it:

2582:   so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "", 1) = 3
2582:   bind(3, 0x0002BE04, 16, 3)                      = 0
2582:           AF_INET  name = 0.0.0.0  port = 50575

cmaple (2580), the other process spawned off by maple, connects
to this socket, using the address 127.0.0.1, and writes four
null bytes to it:

2580:   so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "", 1) = 4
2580:   connect(4, 0x0007332C, 16, 1)                   = 0
2580:           AF_INET  name = 127.0.0.1  port = 50575
2580:   setsockopt(4, SOL_SOCKET, SO_DONTROUTE, 0xFFBFF5DC, 4, 1) = 0
2580:   setsockopt(4, SOL_SOCKET, SO_LINGER, 0xFFBFF5D4, 8, 1) = 0
2580:   write(4, "\0\0\0\0", 4)                         = 4

lsof documents the connection between these two processes:

cordelia$ lsof -i :50575
COMMAND  PID     USER   FD   TYPE        DEVICE SIZE/OFF NODE NAME
cmaple  2580 borchert    4u  IPv4 0x3000097ec50      0t4  TCP localhost:40665->localhost:50575 (ESTABLISHED)
mserver 2582 borchert    3u  IPv4 0x30002fd13e0      0t0  TCP *:50575 (LISTEN)
mserver 2582 borchert   19u  IPv4 0x3000285c330      0t0  TCP localhost:50575->localhost:40665 (ESTABLISHED)
cordelia$

Later on, after some other activities, mserver (2582) accepts
the connection and attempts to read the first packet:

2582:   accept(3, 0x00000000, 0x00000000, 1)            = 19
2582:   recv(19, 0xFFBFF68C, 4, 0)      (sleeping...)

But none is found. The recv() hangs until it times out (ETIMEDOUT).

2580:   recv(4, 0xFFBFF5DC, 4, 0)       (sleeping...)

cmaple (2580) attempts to read the response. So both are deadlocked
until the timeout after which cmaple crashes with SIGSEGV.

A firewall (ipfilter) runs on this machine but this shouldn't matter
(to the best of my knowledge) as this is irrelevant on the loopback device.

It seems that the package is lost due to the SO_DONTROUTE option. If
this option is suppressed using a LD_PRELOAD hack, maple starts
without problems. A truss output of this case can be found at
   http://www.mathematik.uni-ulm.de/sai/borchert/maple-dontroute.truss

According to the getsockopt(2) manual page,

   ``SO_DONTROUTE indicates that outgoing messages should bypass the
   standard routing  facilities.  Instead,  messages are directed to
   the appropriate network interface according to the network  portion
   of the destination address.''

The appropriate network interface seems to be lo0 as the client
connected to 127.0.0.1. Why is it dropped then?

Andreas.

--
Dr. Andreas F. Borchert, SAI, Universitaet Ulm | One should make everything
http://www.mathematik.uni-ulm.de/sai/borchert/ | as simple as possible, but
Helmholtzstrasse 18, E02, Tel +49 731 50-23572 | no simpler. -- A. Einstein

 
 
 

Packet Loss on Loopback Device in Case of SO_DONTROUTE

Post by Andrew Gabri » Sat, 14 Dec 2002 10:17:17




Quote:

> It seems that the package is lost due to the SO_DONTROUTE option. If
> this option is suppressed using a LD_PRELOAD hack, maple starts
> without problems.

I suspect you have bumped into bugid
4749268 connect() to localhost fails when SO_DONTROUTE is set

Workaround is to avoid use of SO_DONTROUTE on loopback.

--
Andrew Gabriel
Consultant Software Engineer