Keyboard/console lockups, Magic SysRQ fails [2.4.20(-pre3)]

Keyboard/console lockups, Magic SysRQ fails [2.4.20(-pre3)]

Post by fl.. » Tue, 04 Feb 2003 19:50:23



I've started testing linux 2.4.x on this hardware around last May, I
have had this problem ever since.

Problem is that keyboard goes unresponsive and console stops updating
every now and then, seems that this happens randomly. Kernel keeps doing
other stuff, like I can play oggs from another machine that are being
shared by samba. Hitting caps lock key seems to fix this most of the
time, sometimes on the first try, othertimes I have to hit it like dozen
or more times. Sometimes I'm not patient enough to keep hitting the caps
lock long enough, so I figure it does not help everytime. Only thing I
can imagine is related to this, I have _once_ seen on the logs something
like "Missing ACK from keyboard several times, noisy keyboard cable?" I
don't know if this is relevant. I have tried this with 1 ps2 keyboard
and with MS keyboard that has both ps2 and usb connectors. No matter
what keyboard/connector I use, the stucking still happens. IIRC, around
2.4.19 this problem might have eased a little.

Now, I have ditched the other OS from this machine, changed it totally
to linux and purchased 1Gb NIC. Unless I'm going blind, it seems that
there is no support for this card in 2.4.19, so I have to use 2.4.20.
And it seems that the lockings happen more often when the kernel is NOT
2.4.19. I haven't really tried with 2.4.19 anymore, since I need to try
to keep the NATed win machine on that 1Gb NIC connected to net and my
wife happy that way...

I've tried with 2.4.20 and 2.4.20-pre3, both of them have this
keyboard/console stucking thing. Plus, it seems that I've got a new
problem. Now kernel goes totally stuck sometimes. Magic keys does not
respond all of the times, other stuff (NAT, Samba) stops working. After
this kind of lockups the filesystem (ext3) needed a check. (Here I
propably screwed up by doing 'mount -n -o remount,rw /' before using
fsck) After one run of fsck, it suggests to run it again and when doing
it, IIRC, it plays the journal. After this, fsck finds a truckload of
errors, fixing them results in massive filesystem corruption (missing
/sbin and stuff like that).

OK, I realize now that using fsck on mounted filesystem was propably not
wise, but I think I have gotten away with that in the past without any
problems... OK, after 2 or 3 massive filesystem damages and a reinstalls
I went with ext2 and continued trying to find the problem (and stopped
running fsck on mounted filesystem). I followed the "KISS" rule and
stripped everything I don't absolutely need out of the kernel. Same
stucking problems continues, now without filesystem corruptions (fsck at
boot fixes successfully things). One time I could ssh to that stucked
box, it seemed that X (4.2) was stuck, was not able to stop it. Also
'shutdown -r now' failed. OK, I replaced X with 3.3.6, which should be
stable... On the very first boot, total lock up, not able to ssh to it.
OK, I removed X alltogether and ran 'chmod 644 <service>' at
/etc/init.d/ on the services I know I don't need while trying to find
the problem. Yet these lockups happen every now and then. Mostly of the
caps lock -curable things.

I have a parallel null modem cable, I could use it to find more info
about lock ups, with some detailed info...
If you can give me any instructions how to help me give more useful
information, I would be happy to follow them.

Here is information on my system:

Motherboard: MSI K7D Master-L, with built-in LAN. (e100 module)

http://www.msi.com.tw/program/products/server/svr/pro_svr_detail.php?...

512MB DDR memory, with ECC, ECC correcting enabled in BIOS.
2*XP1600+ cpus
Intel PRO/1000 MT 1Gbps NIC with 82540 controller (e1000 module)
S3 PCI display adapter
Adaptec ATA RAID 2400A, risc powered raid5 with 4*60GB IBM 60GXP hard disks

and then now unused components during the fault finding:
LG CED-8080B CDRW
ALPS DC544C 4x4 CD-changer

Flexy

P.S. Using automounter with the changer has got the system stuck tree
times already. (ide-scsi) Disabled automounter, did not use the changer,
but when reading a bad audioCD with CDRW player, another lock up.

P.P.S. This same hardware (with the exception of new Intel NIC and
instead of AGP display adapter, now using PCI S3(this S3 has been
running on a different linux machine with uptimes over 90 days, so I
believe the card is ok)) has been running fine and stable in windows for
almost a year now, so unless it has something to do with the Intel NIC
card, I don't believe I have faulty HW.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. NFS/UDP/IP performance - 2.4.19 v/s 2.4.20, 2.4.20-pre3

Greetings.

There seem to be a remarkable performance difference
between 2.4.19 and 2.4.20/2.4.21-pre3 in regards to
NFS writes/reads. I am not sure, but the problem may not
in NFS but somewhere lower (UDP/IP or core).

For example, in my kernel and network configuration a
write to a new file over NFS on 2.4.19 for 5MB takes 2.5
seconds or so. With everything same (including kernel
configuration) 2.4.20 and 2.4.21-pre3 the same takes
11 or more seconds.

Also, when this file write is in progress, the system
time goes up to 15% on 2.4.19, whereas on 2.4.20/21-pre3,
it is about 4%. (I use sar/sysstat for this).

Memory accesses dont seem to be the issue either. Test
program to check this show same times and are ok (as I
expect on the board I use).

"netstat -s" or ifconfig or tcpdump traces dont seem to
point to dropped messages, collisions, retransmissions
etc.

The hardware configuration is PowerPC based, and there
are no changes in the board specific IO subsystem between
2.4.19 and 2.4.20/21-pre3. The same compiler is used for
building both the kernels, and have tried this even with
GCC 3.2, with same results.

So, I dont suspect this is either board or compiler
related issue.

Also, I see some differences in handling of the bottom
halves in net/core/dev.c between 2.4.19 and 2.4.20/21-pre3.
Although, I have not gone through these in details to
assert that this is indeed the problem area.

Questions:

  - Has anyone seen this? Perhaps on other platforms (x86 etc)?
    Is there some tunable that has been added (or is different)
    after 2.4.19, and which needs to be tuned?

  - I have tried to enable kernel profiling to find any
    potential problem code areas. But given the low cpu
    utilization during these copies, I am not sure if this
    can give any useful info.

    Could anyone offer any ideas to debug this?

I would appreciate if you copy me on any responses to this post, I
dont subscribe to this list.

Best regards,
-Arun.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. crypto++ binaries for SunOS

3. IBM x440 problems on 2.4.20 to 2.4.20-rc1-ac3

4. Call for papers - 2nd Conference on File and Storage Technologies (FAST '03)

5. 2.4.20 + XFS patches + rmap15a + Ingo's 2.4.20-rc3 O(1) sched

6. CVS und WinCVS

7. SCSI under 2.4.20-8 but not 2.4.20-18.9 (RH9)

8. Incremental Backup problem

9. sbp2.o fails to load on RH rpms - 2.4.20-2.9, 2.4.20-2.10 and 2.4.20-2.11

10. problem compiling 2.4.20-pre3 - removal of /proc/partitions stat the cause ?

11. 2.4.20-pre3 and promise raid contoller

12. ~2.4.20-pre3 -> 2.4.21 : nfs client read performance "broken"

13. 2.4.20-pre3 hangs on boot on Duron/VIA