Anyone see this 10.01 problem?

Anyone see this 10.01 problem?

Post by Doug Siebe » Mon, 01 Jan 1996 04:00:00



I just managed to find another interesting bug in HP-UX 10.01 (+ all patches
up to about two weeks ago) yesterday and I was wondering whether anyone else
has seen it (I am going to call it in on Tuesday, I'm just curious)

I had an application that uses mmapped files for its work.  Several large files
(a 250M, a 64M, a couple 1M or so) are mapped in by every process and used by
all.  Each process also mapped in one 4K file (out of a total of 50,000 of
these) it used.  Even though those 50,000 files are broken up into 250-odd
directories, so no one directory is really huge, it is a hit since the other
processes frequently need to map in files other than their own for a short time
on a frequent basis.  So I had made a change which put all those 50,000 files
into one big file, with an index at the front, and was going to use that to
access them instead.  Tested the code, everything works great.  Ran the code,
everything works great -- for about 15 minutes.

I had about 200 odd processes accessing this file, modifying their little
sections of the file, looking up and occasionally modifying other sections of
the file.  At some point, every process would lock up trying to make any
reference to the memory range that file lived in.  Even kill -9 would not
touch it.  Trying to cat the file from the shell met with the same result.
That file was now poison.  Everything else worked fine on the system, nothing
else was effected.  I rebooted, everything worked OK again for a few minutes
then I met with the same result.  I deliberately paniced the system to
generate a core dump for HP (I didn't want to drive down to hit the TOC
switch -- HP really ought to have a supported way to panic the system to make
this sort of thing easier though, I used proc/W 0 from adb if you are curious
how to do this)

I don't know really what caused this, the only thing I can think of that I do
with this file that I don't do with the other relatively large files I have
mapped in is that I modify pages of it all over the place, rather than in a
relatively sequential order.  Plus each process is randomly doing an msync()
on its little 4K range every five minutes -- my guess is that the problem
lays there; perhaps some internal table that keeps track of which pages have
been modified and which have been synced is corrupted?  That's the only thing
I can come up with now, at least.  I might play around the next day or two if
I have time (and am not too hung over from New Year's Eve tonight :-) ) and
see if I can find a way to create this problem deliberately on another system.

At any rate, I'm sure HP will be able to figure something out from the core
dump I saved.  But I want to know if anyone else has seen this same situation
where any access to a file hangs the accessing process in a way that even kill
-9 cannot fix.  The file is on a JFS filesystem (and I've already found one
other bug in 10.01 resulting from the combination of mmapped files and a JFS
filesystem) so that might be necessary to creating this condition.  If you've
seen this, please let me know via email, in particular what scenario you see
this problem in, and if you have reported this to HP or not, what the ref #
or SR # is if you have, etc.  Thanks!

--
Doug Siebert              || "Usenet is essentially Letters to the Editor
University of Iowa        ||  without the editor.  Editors don't appreciate

(c) 1995 Doug Siebert.  Redistribution via the Microsoft Network is prohibited.

 
 
 

1. Has anyone else seen this pthread problem ?

on HPUX-10.20 :

Following simple test program works under 'bash' and 'ksh'.

Under 'tcsh', after the last cout, it kept printing the shell prompt
in a loop !!! and in 'csh' it logged me out of the shell.
(the problem seems to be during exiting from main)

Has anyone else seen this before ?

Thanks
--Sachin

#include <iostream.h>
#include <stdio.h>
#include <pthread.h>

int start(void *) {
   static int n=0;
   sleep(4);
   printf("OK! %x #%d\n",pthread_self(),++n);
   return 0;

void thrTest();

int main() {

   thrTest();
   return 0;

void thrTest()
{
   pthread_t x;
   pthread_attr_t s;
   pthread_addr_t ret;

   printf("Hello! %x\n",pthread_self());
   if (pthread_create(&x, pthread_attr_default,
                      (pthread_startroutine_t)start, 0
))
        perror("pthread_create");
  else {
        // puts("Running");
//        pthread_cancel(x);
        pthread_join(x, (&ret));
        printf("Done!\n");
   }

   cout << " returning " << endl;
  //  exit(1);

2. SQL Bind Parameter - Not used for all parameters.

3. Anyone Ever Seen This?? ioscan attached.

4. Help With Oracle 7.1.5 on Alpha VMS

5. Anyone seen this with new NFS (ACE2)?

6. Access Strategy Language: How it uses constructors

7. Anyone seen the FAQ?

8. X25; Has anyone ever seen this work?

9. Anyone have poppassd working on 10.01?

10. hylafax on hpux 10.01 - anyone compiled ?

11. HP-UX 10.20: Problem Seeing DLT4500 Autochanger

12. Have you seen these reall interesting problems