Memory testing program: see any bugs in here?

Memory testing program: see any bugs in here?

Post by Ben Pfaf » Mon, 03 Mar 1997 04:00:00



        I've been trying to track down some flaky behavior on my
system.  In the process, I've written a memory testing program, the
source code of which is below.  It is invoked as `memtest <X>' where
<X> is the number of megabytes to malloc().  I ran this overnight and
it reported a failure on iteration 80.

        My question is: Does anyone see any bugs in this program?  I'm
only interested in bugs that would fail good chips.

        Thanks.

----------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define ab() do { printf("failed\n"); exit(0); } while(0)

int
main(int argc, char *argv[])
{
  int x;
  int size;
  unsigned char *p;
  int cycle = 0;

  size = argc == 2 ? atoi(argv[1]) * 1024 * 1024 : 0;
  if(size == 0)
    {
      printf("Number of megabytes expected on command line.\n");
      exit(1);
    }
  p = malloc(size);
  if(p==NULL)
    {
      printf("Could not malloc %d bytes.\n", size);
      exit(1);
    }
  srand(time(0));
  while(1)
    {
      printf("cycle %d...", cycle); fflush(stdout);
      for(x=0; x<size/2; x++)
        p[x] = p[x+size/2] = x + cycle;
      printf("inited..."); fflush(stdout);
      for(x=0; x<size/2; x++)
        if(p[x] != p[x+size/2])
          ab();
      printf("sequenced..."); fflush(stdout);
      for(x=0; x<size*2; x++)
        {
          int y=rand()%(size/2);
          if(p[y] != p[y+size/2])
            ab();
          p[y] = p[y+size/2] += x+y;
        }
      printf("random...passed\n");

      cycle++;
    }
  abort();

Quote:}

----------------------------------------------------------------------

Sample output:
cycle 0...inited...sequenced...random...passed
cycle 1...inited...sequenced...random...passed
cycle 2...inited...sequenced...random...passed
cycle 3...inited...sequenced...random...passed
 [...]
cycle 80...inited...sequenced...failed

--

PGP public key and home page at http://www.msu.edu/user/pfaffben

 
 
 

Memory testing program: see any bugs in here?

Post by Peter Knoppe » Tue, 04 Mar 1997 04:00:00



>    I've been trying to track down some flaky behavior on my
>system.  In the process, I've written a memory testing program, the
>source code of which is below.  It is invoked as `memtest <X>' where
><X> is the number of megabytes to malloc().  I ran this overnight and
>it reported a failure on iteration 80.
>    My question is: Does anyone see any bugs in this program?  I'm
>only interested in bugs that would fail good chips.
>    Thanks.

Simple-minded memtest program deleted for brevity.

I don't see how your program can fail with good chips, unless
there is a problem in your swap-space, in the swapper (in the
kernel), or the transfer of data between memory and disk. Most
faults in these categories would be painfully obvious...

There are however many many ways in which your program would
not detect a bad chip. As your program runs in user mode, it
does not test any memory cells that are in use by the kernel
or by other programs. Also, the memory that you test may be
swapped out to disk at unpredictable times and then be swapped
back to another physical location when your program accesses
the same logical address.

Memory testing is a bit of an art. You can read something
about it on my web-page at
        http://cardit.et.tudelft.nl/~knop/ramtest.doc

Many linux users have experienced/discovered memory problems
that resulted in sig-11 errors while doing kernel compiles.
Evidently, gcc is a good memory exerciser. Kernel compiles
won't identify _which_ memory module contains the fault. See
Rogier Wolff's web page at
        http://www.bitwizard.nl/sig11/
--


 
 
 

Memory testing program: see any bugs in here?

Post by Ben Pfaf » Tue, 04 Mar 1997 04:00:00




[...]
> I don't see how your program can fail with good chips, unless
> there is a problem in your swap-space, in the swapper (in the
> kernel), or the transfer of data between memory and disk. Most
> faults in these categories would be painfully obvious...

Cool.  That's what I wanted to know.  Thanks.

Quote:

> There are however many many ways in which your program would
> not detect a bad chip. As your program runs in user mode, it
> does not test any memory cells that are in use by the kernel
> or by other programs. Also, the memory that you test may be
> swapped out to disk at unpredictable times and then be swapped
> back to another physical location when your program accesses
> the same logical address.

> Memory testing is a bit of an art. You can read something
> about it on my web-page at
>    http://cardit.et.tudelft.nl/~knop/ramtest.doc

Yup, I know, but I was pressed for time so I rolled my own instead of
finding a more professional program.

Quote:> Many linux users have experienced/discovered memory problems
> that resulted in sig-11 errors while doing kernel compiles.
> Evidently, gcc is a good memory exerciser. Kernel compiles
> won't identify _which_ memory module contains the fault. See
> Rogier Wolff's web page at
>    http://www.bitwizard.nl/sig11/

I have.  In fact, I originally had the problems when packaging a
program for Debian GNU/Linux.  I'd like to suggest that the Debian
package tools are almost as good a memory exerciser (for large
packages) as kernel builds are.  :-)
--

PGP public key and home page at http://www.msu.edu/user/pfaffben
 
 
 

Memory testing program: see any bugs in here?

Post by James M Anderso » Thu, 06 Mar 1997 04:00:00


My officemate is also having some memory problems with his Linux
machine.  He has a Gateway2000 G6-200 (a PentiumPro) which was
delivered in mid to late February.  It has 128MB of RAM and a 128MB
swap partition.  It started failing when doing large make jobs.  We
have run  mkswap -c  several times on the swap partition with little
success.  

My officemate wrote up a little C program to test the memory
(included below).  If the resulting program is run immediately after
booting up, the program runs correctly for all memory allocations
under ~100MB.  After the program starts using the swap area, he gets
lots of errors and after a couple of minutes the whole system will
freeze up.  He's writing out 0xFFFFFFFF to an integer, and every
second or fourth integer coming back will return as 0xFFFF7FFF.

He claimed that this did not happen in Windows95, so I compiled his
program with my Watcom C compiler.  Unfortunately, I do not have the
Windows compiler installed, so I compiled it as a 32 bit DOS4GW
program.  In a Windows95 DOS box, this program will run correctly unless
told to use over ~70MB, at which point it too fails.  (Note that this is
the same time as DOS4GW starts to swap out to disk.)  After this
program fails, it is a very short time before a total halt.

But Windows95 itself does not show this behaviour if the DOS4GW
program is not run.  He has manually set the Windows95 virtuall memory
to some huge amount (it takes a LOT of Netscape sessions to use over
128MB) and no programs ever had a problem.  However, I suspect that
there is something wrong with the Windows95 virtual memory manager, as
the system used 12% of the memory at boot up whether we asked for 0MB
of virtual memory or 300.

Tomorrow we will open up the case to inspect the motherboard.

Does anyone have any ideas on what might be wrong with this system?

#include <stdio.h>
#include <stdlib.h>
#define K (1024*1024L/4)

#define VAL 0xffffffff

int main( int argc, char *argv[] ) {
  int *p;
  long i;
  long megs;
  long s;

  if (argc == 2)
    megs = (long) atoi( argv[1] );
  else
    megs = 10L;

  s = megs*K;

  printf( "Allocating %ld megs\n", megs );

  p = (int *) malloc( s*sizeof(int) );
  if (p == (int *) NULL) {
    printf( "malloc failed\n" );
    exit(1);
  }

  printf( "Using val = %x\n", VAL );
  for (i=0; i < s; ++i) {
    p[i] = VAL;
  }          
  for (i=0; i < s; ++i) {
    if (p[i] != VAL) {
      printf( "Wrong at i=%ld\n", i );
      printf( " val = %x\n", p[i] );
    }        
  }          
  printf( "Done\n" );
  return(0);

Quote:}            

--
//****************************************************************************
James M Anderson                          Computer Specialist, Telescope

Telephone: (520) 556-7381                 2255 N Gemini Dr.
                                          Flagstaff   AZ   86001