innd: cant read Resource temporarily unavailable (SunOS5.4)

innd: cant read Resource temporarily unavailable (SunOS5.4)

Post by Ian Dickins » Fri, 16 Dec 1994 22:01:12



I'm just about to move our news server to SunOS5.4
I've compiled and installed a test server on another machine.
Most of it seems to work just fine and dandy.

I'm getting sequences of log messages as follows:

Dec 15 12:32:11 crocus innd: news-out.csv.warwick.ac.uk connected 19
Dec 15 12:32:11 crocus innd: news-out.csv.warwick.ac.uk:19 cant read Resource temporarily unavailable
Dec 15 12:32:11 crocus innd: news-out.csv.warwick.ac.uk:19 cant read Resource temporarily unavailable
Dec 15 12:32:11 crocus innd: news-out.csv.warwick.ac.uk:19 cant read Resource temporarily unavailable
Dec 15 12:32:11 crocus innd: news-out.csv.warwick.ac.uk:19 cant read Resource temporarily unavailable
Dec 15 12:32:12 crocus innd: news-out.csv.warwick.ac.uk:19 cant read Resource temporarily unavailable
Dec 15 12:32:12 crocus innd: news-out.csv.warwick.ac.uk:19 cant read Resource temporarily unavailable
Dec 15 12:32:12 crocus innd: news-out.csv.warwick.ac.uk:19 closed seconds 1 accepted 4 refused 0 rejected 0

A call to read() in innd/chan.c is returning EAGAIN, which possibly
implies that select() is either broken or doing the wrong thing.
It could be broken since select() is layered on top of poll().

poll(0xDFFFDB30, 4, 300000)                     = 1
write(19, " 2 0 0   w a r w i c k -".., 62)     = 62
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, " I H A V E   < 9 4 1 2 1".., 4095)    = 38
poll(0xDFFFDB30, 4, 300000)                     = 1
write(19, " 3 3 5\r\n", 5)                      = 5
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, 0x001BC668, 4095)                      Err#11 EAGAIN
time()                                          = 787495932
putmsg(3, 0xDFFFEFA0, 0xDFFFEF94, 0)            = 0
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, " P a t h :   w a r w i c".., 4095)    = 4095
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, " p", 4096)                            = 1
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, 0x001C2278, 4095)                      Err#11 EAGAIN
time()                                          = 787495933
putmsg(3, 0xDFFFEFA0, 0xDFFFEF94, 0)            = 0
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, " a s s i n g   t h e  \r".., 4095)    = 1460
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, 0x001C282C, 2635)                      Err#11 EAGAIN
time()                                          = 787495933
putmsg(3, 0xDFFFEFA0, 0xDFFFEF94, 0)            = 0
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, " - - - - - - - - - - - -".., 2635)    = 2635
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, "  ", 8192)                            = 1
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, 0x001C7290, 8191)                      Err#11 EAGAIN
time()                                          = 787495933
putmsg(3, 0xDFFFEFA0, 0xDFFFEF94, 0)            = 0
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, " 2 . 1 A   o r   2 . 2 )".., 8191)    = 1460
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, 0x001C7844, 6731)                      Err#11 EAGAIN
time()                                          = 787495933
putmsg(3, 0xDFFFEFA0, 0xDFFFEF94, 0)            = 0
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, " %   s a l e s   t a x :".., 6731)    = 2636
poll(0xDFFFDB30, 4, 300000)                     = 1
read(19, 0x001C8290, 4095)                      Err#11 EAGAIN
time()                                          = 787495933
putmsg(3, 0xDFFFEFA0, 0xDFFFEF94, 0)            = 0
time()                                          = 787495933
putmsg(3, 0xDFFFEF98, 0xDFFFEF8C, 0)            = 0
close(19)                                       = 0
poll(0xDFFFDB30, 3, 300000)     (sleeping...)

Any ideas?

Cheers,
--
Ian 'Vato'*inson [ID17]                                   Kibo bait :-)

           MIME mail welcome - don't send me no s*kin' X.400
      Click <A HREF="http://www.veryComputer.com/~cudep/">here</A>.

 
 
 

innd: cant read Resource temporarily unavailable (SunOS5.4)

Post by Casper H.S. D » Fri, 16 Dec 1994 22:31:38



>I'm just about to move our news server to SunOS5.4
>I've compiled and installed a test server on another machine.
>Most of it seems to work just fine and dandy.
>I'm getting sequences of log messages as follows:
>Dec 15 12:32:11 crocus innd: news-out.csv.warwick.ac.uk connected 19
>Dec 15 12:32:11 crocus innd: news-out.csv.warwick.ac.uk:19 cant read Resource temporarily unavailable

Someone else in news.software.nntp complained about this too.

It seems that a poll() immediately followed by a read from a socket that
is non blocking returns EAGAIN (same as EWOULDBLOCK).  That isn't as
it should be.  Does INN drop articles?  I think you can just check for
EAGAIN and call select again.

Casper

 
 
 

innd: cant read Resource temporarily unavailable (SunOS5.4)

Post by Ian Dickins » Sat, 17 Dec 1994 00:51:05





>>I'm just about to move our news server to SunOS5.4
>>I've compiled and installed a test server on another machine.
>>Most of it seems to work just fine and dandy.

>>I'm getting sequences of log messages as follows:

>>Dec 15 12:32:11 crocus innd: news-out.csv.warwick.ac.uk connected 19
>>Dec 15 12:32:11 crocus innd: news-out.csv.warwick.ac.uk:19 cant read Resource temporarily unavailable

>Someone else in news.software.nntp complained about this too.

>It seems that a poll() immediately followed by a read from a socket that
>is non blocking returns EAGAIN (same as EWOULDBLOCK).  That isn't as
>it should be.  Does INN drop articles?  I think you can just check for
>EAGAIN and call select again.

What happens is it fails with EAGAIN, and CHANreadtext() returns -1
in innd/nc.c.  It then goes round the loop until BAD_IO_COUNT errors
are returned ie 5 more times.  At this point innd closes the connection down.

So if poll() immediately followed by a read from a socket causes this,
how will calling select() again alter this behaviour, since the next thing to
happen is a read()?  It already looks like this is happening BAD_IO_COUNT
times.

Cheers,
--
Ian 'Vato'*inson [ID17]                                   Kibo bait :-)

           MIME mail welcome - don't send me no s*kin' X.400
      Click <A HREF="http://www.veryComputer.com/~cudep/">here</A>.

 
 
 

innd: cant read Resource temporarily unavailable (SunOS5.4)

Post by Casper H.S. D » Sat, 17 Dec 1994 02:03:54



>What happens is it fails with EAGAIN, and CHANreadtext() returns -1
>in innd/nc.c.  It then goes round the loop until BAD_IO_COUNT errors
>are returned ie 5 more times.  At this point innd closes the connection down.

I'd try to ignore the error and retry the read.  Why are those
file descriptors non blocking anyway?

Quote:>So if poll() immediately followed by a read from a socket causes this,
>how will calling select() again alter this behaviour, since the next thing to
>happen is a read()?  It already looks like this is happening BAD_IO_COUNT
>times.

Because it's a bug and this might work as a work around?

Casper

 
 
 

innd: cant read Resource temporarily unavailable (SunOS5.4)

Post by Leif Hedstr » Sat, 17 Dec 1994 05:27:35




>>I'm just about to move our news server to SunOS5.4
>>I've compiled and installed a test server on another machine.
>>Most of it seems to work just fine and dandy.
>>I'm getting sequences of log messages as follows:
>>Dec 15 12:32:11 crocus innd: news-out.csv.warwick.ac.uk connected 19
>>Dec 15 12:32:11 crocus innd: news-out.csv.warwick.ac.uk:19 cant read Resource temporarily unavailable
>Someone else in news.software.nntp complained about this too.
>It seems that a poll() immediately followed by a read from a socket that
>is non blocking returns EAGAIN (same as EWOULDBLOCK).  That isn't as
>it should be.  Does INN drop articles?  I think you can just check for
>EAGAIN and call select again.

I have exactly the same problem, and I think I know what is wrong.  For
some reason, select() in Solaris2.4 is not working 100% correct.  On of
in-house projects have exactly the same problem. It seams that select()
will return ready FD's, even though the FD doesn't have anything
available to "read". [I know select() is calling poll(), so perhaps it's
poll() that is broken].

INN uses select(), and then calls the channels reader function. This
function expects something to be ready to read, and if not, it will
print the error message above.

The only problem with this (as far as I can see) is that sometimes the
channels are closed. INN will close a channel after 5 consecutive failed
reads. This can be configured in config.data, look for BAD_IO_COUNT.
It's "bad" to close the channels, since it usually takes some time until
the feeder connects again.

I also did some tests with innd/chan.c, to call select() twice in a row
in the function CHANreadloop(). The second select() seems to be more
"reliable", but I don't know how much extra CPU this select() consumes.
[Does anyone know how expensive a select()/poll() call is?]

I reported the select() problem to Sun some weeks ago, but I don't think
they have a solution yet. They haven't even confirmed it to be a bug. I
might be wrong, but I really believe select() (or the underlying poll())
really is broken in SunOS5.4.

-- Leif
--
Leif Hedstrom

 
 
 

innd: cant read Resource temporarily unavailable (SunOS5.4)

Post by Ian Dickins » Sat, 17 Dec 1994 19:02:50




Quote:>I have exactly the same problem, and I think I know what is wrong.  For
>some reason, select() in Solaris2.4 is not working 100% correct.  On of
>in-house projects have exactly the same problem. It seams that select()
>will return ready FD's, even though the FD doesn't have anything
>available to "read". [I know select() is calling poll(), so perhaps it's
>poll() that is broken].

I suspect poll().  From truss, it is definately returning POLLRDNORM on
the channel fd.  It's either a poll() bug, in that it shouldn't return this,
or it's a read()/poll() interaction bug where the read() fails when there IS
data available.  The following is from my hacked code:

poll(0xDFFFDB30, 4, 300000)                     = 1
        fd=4  ev=POLLRDNORM rev=0
        fd=13 ev=POLLRDNORM rev=0
        fd=15 ev=POLLRDNORM rev=0
        fd=19 ev=POLLRDNORM rev=POLLRDNORM
read(19, 0x001CE970, 4095)                      Err#11 EAGAIN
poll(0xDFFFDB30, 4, 300000)                     = 1
        fd=4  ev=POLLRDNORM rev=0
        fd=13 ev=POLLRDNORM rev=0
        fd=15 ev=POLLRDNORM rev=0
        fd=19 ev=POLLRDNORM rev=POLLRDNORM
read(19, 0x001CE970, 4095)                      = 2871

Quote:>INN uses select(), and then calls the channels reader function. This
>function expects something to be ready to read, and if not, it will
>print the error message above.

>The only problem with this (as far as I can see) is that sometimes the
>channels are closed. INN will close a channel after 5 consecutive failed
>reads. This can be configured in config.data, look for BAD_IO_COUNT.
>It's "bad" to close the channels, since it usually takes some time until
>the feeder connects again.

Since it doesn't fail every time, I just checked for EAGAIN, and return
back up the tree.  It adds a simple comparison to the loop when read()
fails, and also when that function returns - the successful code path
is unaltered and the channels don't get closed unnecessarily.

Quote:>I also did some tests with innd/chan.c, to call select() twice in a row
>in the function CHANreadloop(). The second select() seems to be more
>"reliable", but I don't know how much extra CPU this select() consumes.

Does simply doing a second select() *always* work correctly?

Quote:>[Does anyone know how expensive a select()/poll() call is?]

I don't know exactly, but it's probably not as expensive as it was under
SunOS5.3, since poll() is no longer a single critical path in the kernel,
ie multiple poll()s can happen at once now (maybe this is what broke it?).

Quote:>I reported the select() problem to Sun some weeks ago, but I don't think
>they have a solution yet. They haven't even confirmed it to be a bug. I
>might be wrong, but I really believe select() (or the underlying poll())
>really is broken in SunOS5.4.

Agreed.  Does anyone have a smaller piece of code than innd :-) that
replicates this?

Cheers,
--
Ian 'Vato'*inson [ID17]                                   Kibo bait :-)

           MIME mail welcome - don't send me no s*kin' X.400
      Click <A HREF="http://www.veryComputer.com/~cudep/">here</A>.

 
 
 

1. Problem: Solaris 2.4 & inn: cant read Resource temporarily unavailable

This is a bug in Solaris 2.4 select.  The bug is being fixed and
a fix should appear in a patch soon.  Meanwhile, use the following

make update and stop & restart innd.

Here's my nasty hack to fix this:

*** innd/chan.c.ORIG    Wed Dec 14 11:03:16 1994
--- innd/chan.c Thu Dec 15 17:00:54 1994
***************
*** 497,502 ****
--- 497,508 ----
      bp->Left = bp->Size - bp->Used;
      i = read(cp->fd, &bp->Data[bp->Used], bp->Left - 1);
      if (i < 0) {
+ #ifdef SUNOS5
+     /* return of -2 indicates EAGAIN, for SUNOS5.4 poll() bug workaround */
+       if (errno == EAGAIN) {
+           return -2;
+       }
+ #endif
        syslog(L_ERROR, "%s cant read %m", p);
        return -1;
      }
*** innd/nc.c.ORIG      Thu Mar 18 21:04:28 1993
--- innd/nc.c   Thu Dec 15 17:00:41 1994
***************
*** 783,788 ****
--- 783,794 ----
      /* Read any data that's there; ignore errors (retry next time it's our
       * turn) and if we got nothing, then it's EOF so mark it closed. */
      if ((i = CHANreadtext(cp)) < 0) {
+ #ifdef SUNOS5
+     /* return of -2 indicates EAGAIN, for SUNOS5.4 poll() bug workaround */
+       if (i == -2) {
+           return;
+       }
+ #endif
        if (cp->BadReads++ >= BAD_IO_COUNT) {
            if (NCcount > 0)
                NCcount--;

2. Fonts displaying as small squares after XFree86 upgrade

3. Help: "cant read Resource temporarily unavailable"

4. Character based Windowlibrary

5. compaq aero with trackball woes

6. Resource temporarily unavailable from socket read?

7. can't get DNS to work..

8. ls: /net/beluga: Resource temporarily unavailable

9. syslogd: recvfrom inet: resource temporarily unavailable

10. FBSD Error: input: Resource temporarily unavailable

11. SIOCSIFFLAGS: Resource temporarily unavailable

12. dhcpd[50] dhcpStart: ioctl SIOCSIFFLAGS: Resources Temporarily Unavailable