several socket errors corrected

several socket errors corrected

Post by Orest Zborowski CO » Tue, 30 Jun 1992 21:43:21



great news!

i've fixed two major problems in the socket code in the linux kernel. one
resulted in the infamous xfig bug where it would hang the machine, and the
other resulted in a bad "interrupted" error return (thanks to Jaime Jofre).

xfig bug: xfig was sending to the server a whole bunch of images for later
 use, and the server was sending back updates. after the socket buffer became
 full, it returned 0 instead of EAGAIN, so the server merrily continued
 sending the "remaining" portion of the message - obviously forever. this
 bug, i believe, is also related to the mysterious server crashes when too
 much was going on (i.e. when i tried starting three xterms all in a row and
 they paged like mad).

interrupt bug: i was a little naive in my handling of interrupts. whenever
 the socket code needed to sleep and was interrupted, i always returned
 EINTR to the process. unfortunately, this should only be done if the signal
 is supposed to interrupt the process (i.e. if its not in default handling
 where the default is to ignore the signal). i think the fix below, where
 it returns ERESTARTSYS, allows the signal handler to determine if the signal
 should be passed to the process or ignored. the other instances of EINTR
 are for true interrupts (i.e. client sleeping on connection completion,
 and the interrupt is due to the server going away), but i'll keep an eye
 on them in case i'm wrong there too.

here are the patches, relative to the linux/kernel/net subdirectory. linus -
if these patches seem ok, please include them in the next release. others
of you - please apply these patches and let me know if there are any
problems - i've been running x with them and haven't noticed any, and xfig
happily executes!

---cut---
diff -c -r OLD/socket.c ./socket.c
*** OLD/socket.c        Tue Jun 16 23:03:08 1992
--- ./socket.c  Sun Jun 28 15:08:45 1992
***************
*** 587,593 ****
                interruptible_sleep_on(sock->wait);
                if (current->signal & ~current->blocked) {
                        PRINTK("sys_accept: sleep was interrupted\n");
!                       return -EINTR;
                }
        }

--- 587,593 ----
                interruptible_sleep_on(sock->wait);
                if (current->signal & ~current->blocked) {
                        PRINTK("sys_accept: sleep was interrupted\n");
!                       return -ERESTARTSYS;
                }
        }

diff -c -r OLD/unix.c ./unix.c
*** OLD/unix.c  Sun Jun 28 14:57:47 1992
--- ./unix.c    Sun Jun 28 15:11:56 1992
***************
*** 466,476 ****
        while (!(space = UN_BUF_SPACE(pupd))) {
                PRINTK("unix_proto_write: no space left...\n");
                if (nonblock)
!                       return 0;
                interruptible_sleep_on(sock->wait);
                if (current->signal & ~current->blocked) {
                        PRINTK("unix_proto_write: interrupted\n");
!                       return -EINTR;
                }
                if (sock->state == SS_DISCONNECTING) {
                        PRINTK("unix_proto_write: disconnected (SIGPIPE)\n");
--- 466,476 ----
        while (!(space = UN_BUF_SPACE(pupd))) {
                PRINTK("unix_proto_write: no space left...\n");
                if (nonblock)
!                       return -EAGAIN;
                interruptible_sleep_on(sock->wait);
                if (current->signal & ~current->blocked) {
                        PRINTK("unix_proto_write: interrupted\n");
!                       return -ERESTARTSYS;
                }
                if (sock->state == SS_DISCONNECTING) {
                        PRINTK("unix_proto_write: disconnected (SIGPIPE)\n");
---cut---

zorst

--
zorst (orest zborowski)

 
 
 

1. correcting "bind to socket error"

Question:  On SGI Unix, is there a way to correct a "bind to socket error"
without
rebooting the machine?  We get this error on an intermittent basis when a
task
using sockets crashes unexpectedly, and thereafter it is impossible to bind
that socket to the same port number again unless we reboot the SGI.  I.e.,
since
we bind addresses to static port numbers, the crash of a single task
sometimes
requires us to reboot an entire machine before we can restart that task.

2. Solaris 2.4

3. Several questions about sockets and file descriptors

4. Could someone please send me a 'lock' source code?

5. Grabbing several datagrams from a socket

6. Help files avialable by e-mail

7. : One socket connection, several client requests?

8. Work Load manager question

9. Asyncron socket I/O and correct signal handling ?

10. ProFTPD error: Socket operation on non-socket

11. Apache error: getpeername: Socket operation on non-socket

12. socket error 98 on RedHat 7.0, server restart tcp socket.

13. Warning: accept() error: Socket operation on non-socket