Greetings to all !
((Due to a configuration error in our news-server, my original
posting did not make it to the outside world. Please excuse
the inconvenience if this is a re-posting for any of you.))
In our software, the client communicates with the server via
a SHM segment created by the server; the client just does a
'shmat' with address 0 (= let the system determine).
At a customer site, we now experienced some 'shmat' failures
with errno = EINVAL. They are intermittent and may occur at
peak load times, I cannot check the machine load from here.
Checking our code (which had no problems in that area for long time)
and the reference manual for 'shmat', we see no reason for that.
It just mentions 4 possible reasons for EINVAL:
1) The SHM_RDONLY and SHM_COPY flags are both set.
2) The SharedMemoryID parameter is not a valid shared memory identifier.
3) The SharedMemoryAddress parameter is not equal to 0, and ...
points outside the address space of the process.
4) The SharedMemoryAddress parameter is not equal to 0,
the SHM_RND flag is not set in the SharedMemoryFlag parameter,
and ... points to a location outside of ...
In this case, both the code and the error message written by it agree:
1) The flag parameter is 0 - none set.
2) The ID is valid, a call to 'shmctl ( id, IPC_STAT, ...)'
(immediately after the error) returns information as expected.
The segment was created at server start (hours or days ago).
3+4) The address passed is 0.
The site runs AIX 4.3.2.0 on a 4-way SMP machine.
Questions:
a) Are there any known problems with 'shmat' in general,
or with AIX 4.3.2.0, or on SMP machines ?
b) Is there any specific PTF or other SW upgrade they should
install ?
c) Is there any other known reason for 'shmat' to return EINVAL
than the four I quoted from the manual ?
It now appears that the segment in question has the "delete"
flag set (but still exists because it is still attached by
some processes) - is this a reason for the errno value "EINVAL" ?
d) If the "delete" is the reason - are there any known problems
or effects which would cause AIX or some component to
accidentally delete a SHM segment, or to apply the "delete"
to another segment than the intended one ?
Two cases we analyzed both had it happen to ID 16 - any clue ?
Thank you for all hints.
Regards, Joerg Bruehe
--
Joerg Bruehe, SQL Datenbanksysteme GmbH, Berlin, Germany
(speaking only for himself)