I'm experiencing a problem with semaphores when a blocking semop()
call is interrupted by a signal, and subsequent semop()
reserve/release calls are issued before re-issuing the original
(interrupted) semop() call.
It seems that the original semaphore status gets mangled by the
subsequent semop() calls after the waiting semop() call is interrupted
(see below for details).
I wrote a test program to explore variations on this problem;
it behaves the same way on various versions of AIX (including
AIX 4.3.1) and also on a SPARCstation running SunOS 5.5.1.
I'm not sure if I've overlooked some well-known rule of semaphore
programming, or if there's a flaw in the o/s.
Any ideas would be welcomed - further details (including test
program source code) available on request.
-----------------------
scenario descriptions:
general scenario that works Just Fine:
process A is waiting on semaphore X, inside a semop() call.
process B sends a signal to process A.
process A diverts to its signal handler and processes the signal,
then returns to its mainline code. the interrupted
semop() call returns indicating errno=EINTR, so process A
reissues the semop() call and resumes waiting on the
semaphore.
eventually, process B releases the semaphore and process A returns
successfully from the blocking semop() call.
modified scenario that Doesn't Work Fine:
if process A is inside its signal handler and issues semop() calls
to reserve and then release semaphore Y (a different
semaphore than the one it was waiting on when it got
interrupted), there is a problem. after process A
returns from its signal handler, notes the EINTR status
from the interrupted semop() call, and reissues the
semop() call, this time it gets the semaphore
immediately, even though nobody has released it yet.
The semaphore values indicated by semctl(GETVAL, etc.)
at various points in the code don't indicate an obvious
problem.
sigsetjmp/siglongjmp variation - doesn't work either:
POSIX standards seem to suggest it may be unwise to issue
semop() calls from within a signal handler (although it's
not clear to me if the restriction matters if the semop()
calls only involve _some other semaphore_ than the
one we were waiting on). in any case, a modified version
of the test program was created using
sigsetjmp/siglongjmp calls so that the additional semop()
calls to be issued as a result of the signal are not
performed inside the signal handler, but rather are
performed after the signal handler returns to the
mainline code (via siglongjmp), before re-issuing the
interrupted semop() call.
this does NOT change the behavior - process A still gets
the semaphore immediately after re-issuing the
interrupted semop() call.
--
Ken Buck