We are seeing something rather strange when running our STREAMS multiplexor
drivers on a Solaris 2.3 dual CPU SPARC 20. What is happening is that
occasionally, I believe after an I_UNLINK, the queue associated w/ a
running lower write service routine will revert back to the stream head
part way thru execution of the service routine. Needless to say this has
devastating consequences. What seems to be happening is that there is a
lack of synchronization wrt the running of the lower service routines and
the return of a "borrowed" queue pair. We have never seen this occur on
single CPU SPARC 10s or on a dual CPU SPARC 20 w/ one CPU disabled.
We do not use perimeters (should we?), but instead do mutex locking on a
driver basis. I.e., each driver has a single mutex which is acquired first
thing in each put and service routine. The mutex is released prior to
putnext() and family, and qreply(), and it is reacquired afterwards if
necessary. Just prior to return from the service or put routine, the mutex
is released. You can't get much coarser granularity than that.
Is this a Solaris bug? If so, is there a work-around? If not, what are we
doing incorrectly? Is there a lock of some sort that we should grab prior
to performing processing in the service routine?
Any help gratefully accepted.
--
Jim Robinson
{ubc-cs!van-bc,uunet}!mdivax1!robinson