page faults, kernel BUG, no swap usage and other wierdness

page faults, kernel BUG, no swap usage and other wierdness

Post by Christopher Alber » Sun, 06 Jan 2002 05:24:16



Greetings,

I have several RH7.1 machines with identical configs,
Tyan Thunder, 2x Athlon 1.2, 1G ram 2G swap with single
9G scsi disks. All were using 2.4.2-2smp.

One of these machines and only one has exhibited strange
behaviour for about 6 weeks. Periodically, about once a week
it freezes up and i have to reset it, having no access to
the console. I can ping it but thats it. logs showed
kernel BUG messages related to swapping and paging. I upgraded to
2.4.9-12smp which gave a slightly different pattern of crashes
and kermnel bug messages, one of which is appended at the end.

In addition, this machine never seems to use any swap, even when
I run a bunch of apps, pigs like netscape. I tried swapoff,mkswap -c,
swapon -p 32000 and still no swap activity. All the other machines show
some small swap usage.

Sometimes the crashes happen when users are using vmware, othertimes
druing the night when syslog restarts.

I assume it is a hardware problem.
I can get my supplier to fix this machine, but I wanted to have
something more informative to say than "it's broken".

Any ideas?

TIA

Christopher

Jan  3 15:58:18 data kernel:  <4>probable hardware bug: clock timer
configuration lost - probably a VIA686a motherboard.
Jan  3 15:58:18 data kernel: probable hardware bug: restoring chip
configuration.
Jan  3 15:58:20 data kernel: ------------[ cut here ]------------
Jan  3 15:58:20 data kernel: kernel BUG at page_alloc.c:85!
Jan  3 15:58:20 data kernel: invalid operand: 0000
Jan  3 15:58:20 data kernel: CPU:    0
Jan  3 15:58:20 data kernel: EIP:    0010:[__free_pages_ok+43/928]
Not tainted
Jan  3 15:58:20 data kernel: EIP:    0010:[<c013624b>]    Not tainted
Jan  3 15:58:20 data kernel: EFLAGS: 00013282
Jan  3 15:58:20 data kernel: eax: 0000001f   ebx: c208de28   ecx:
c02fd544   edx: 00011fd6
Jan  3 15:58:20 data kernel: esi: ed53f3c0   edi: 00000004   ebp:
c208de28   esp: ed5b9e60
Jan  3 15:58:20 data kernel: ds: 0018   es: 0018   ss: 0018
Jan  3 15:58:20 data kernel: Process X (pid: 6743, stackpage=ed5b9000)
Jan  3 15:58:20 data kernel: Stack: c023999e 00000055 fe2cb000 00078b00
00000000 c20e8fa4 c01413a2 00000000
Jan  3 15:58:20 data kernel:        c208de28 c208de28 00000004 0000004b
c0137425 c01307ec f7678de0 c0361020
Jan  3 15:58:20 data kernel:        f781a8a0 c208de28 c036115c c0129a47
c208de28 000000dc 00152000 00000040
Jan  3 15:58:20 data kernel: Call Trace: [copyrite+26078/27691] copyrite
[kernel] 0x65de Jan  3 15:58:20 data kernel: Call Trace: [<c023999e>]
copyrite [kernel] 0x65de
Jan  3 15:58:20 data kernel: [generic_commit_write+146/160]
generic_commit_write [kernel] 0x92
Jan  3 15:58:20 data kernel: [<c01413a2>] generic_commit_write [kernel]
0x92
Jan  3 15:58:20 data kernel: [free_page_and_swap_cache+197/208]
free_page_and_swap_cache [kernel] 0xc5
Jan  3 15:58:20 data kernel: [<c0137425>] free_page_and_swap_cache
[kernel] 0xc5
Jan  3 15:58:20 data kernel: [generic_file_write+1052/1568]
generic_file_write [kernel] 0x41c
Jan  3 15:58:20 data kernel: [<c01307ec>] generic_file_write [kernel] 0x41c
Jan  3 15:58:20 data kernel: [zap_page_range+1143/1232] zap_page_range
[kernel] 0x477
Jan  3 15:58:20 data kernel: [<c0129a47>] zap_page_range [kernel] 0x477
Jan  3 15:58:20 data kernel: [dput+28/384] dput [kernel] 0x1c
Jan  3 15:58:20 data kernel: [<c0150c8c>] dput [kernel] 0x1c
Jan  3 15:58:20 data kernel: [exit_mmap+201/304] exit_mmap [kernel] 0xc9
Jan  3 15:58:20 data kernel: [<c012c579>] exit_mmap [kernel] 0xc9
Jan  3 15:58:20 data kernel: [mmput+91/128] mmput [kernel] 0x5b
Jan  3 15:58:20 data kernel: [<c0119bfb>] mmput [kernel] 0x5b
Jan  3 15:58:20 data kernel: [do_exit+230/624] do_exit [kernel] 0xe6
Jan  3 15:58:20 data kernel: [<c011e446>] do_exit [kernel] 0xe6
Jan  3 15:58:20 data kernel: [filp_close+158/176] filp_close [kernel] 0x9e
Jan  3 15:58:20 data kernel: [<c013d3be>] filp_close [kernel] 0x9e
Jan  3 15:58:20 data kernel: [system_call+51/56] system_call [kernel] 0x33
Jan  3 15:58:20 data kernel: [<c010719b>] system_call [kernel] 0x33
Jan  3 15:58:20 data kernel:
Jan  3 15:58:20 data kernel:
Jan  3 15:58:20 data kernel: Code: 0f 0b 59 5b 8b 55 08 85 d2 74 10 6a
57 68 9e 99 23 c0 e8 ce
Jan  3 15:58:28 data kernel:  ------------[ cut here ]------------
Jan  3 15:58:28 data kernel: kernel BUG at page_alloc.c:85!

 
 
 

page faults, kernel BUG, no swap usage and other wierdness

Post by Tony Lawrenc » Sun, 06 Jan 2002 05:29:02



> Greetings,

> I have several RH7.1 machines with identical configs,
> Tyan Thunder, 2x Athlon 1.2, 1G ram 2G swap with single
> 9G scsi disks. All were using 2.4.2-2smp.

> One of these machines and only one has exhibited strange
> behaviour for about 6 weeks. Periodically, about once a week
> it freezes up and i have to reset it, having no access to
> the console. I can ping it but thats it. logs showed
> kernel BUG messages related to swapping and paging. I upgraded to
> 2.4.9-12smp which gave a slightly different pattern of crashes
> and kermnel bug messages, one of which is appended at the end.

> In addition, this machine never seems to use any swap, even when
> I run a bunch of apps, pigs like netscape. I tried swapoff,mkswap -c,
> swapon -p 32000 and still no swap activity. All the other machines show
> some small swap usage.

> Sometimes the crashes happen when users are using vmware, othertimes
> druing the night when syslog restarts.

> I assume it is a hardware problem.
> I can get my supplier to fix this machine, but I wanted to have
> something more informative to say than "it's broken".

> Any ideas?

How about "it's broken real bad?"

These kinds of things are tough.  Unless your vendor is Linux savvy,
they are just going to insist that it's your OS and there's nothing they
can do about it- except install Windows and see if it crashes in a
similar way.  Probably not what you had in mind.

--
Tony Lawrence
SCO/Linux Support Tips, How-To's, Tests and more: http://pcunix.com