Probable Memory/VM issue.

Probable Memory/VM issue.

Post by Petr » Thu, 07 Mar 2002 14:50:07



I'm having a persistent (since december) problem with Mysql crashing
under various versions of 2.4, ranging from 13 to 18.

I'm posted about this before, and the consensus (Both from this list,
and from the support team at Mysql) was that this was a VM issue.

We pushed the app back on to the Sun e4500 while we dealt with several
other issues, and are now getting back around to trying to solve this
one.

We are running the Mysql.com compiled mysql-3.23.43-pc-linux-gnu-i686 on
a VA Linux 2230 with 2 gig of ram and 2 CPUs:
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_HIGHMEM=y
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
CONFIG_SMP=y
# CONFIG_MULTIQUAD is not set
CONFIG_HAVE_DEC_LOCK=y

Currently the kernel is 2.4.18 with the LVM and VFS-lock patch.

I compiled in most of the kernel-debug stuff:
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_HIGHMEM=y
CONFIG_DEBUG_SLAB=y
# CONFIG_DEBUG_IOVIRT is not set
CONFIG_MAGIC_SYSRQ=y
# CONFIG_DEBUG_SPINLOCK is not set
CONFIG_DEBUG_BUGVERBOSE=y

and we threw this machine into production (it's the only way we have of
generating this problem).

It ran fine for 96+ hours, then mysql died. No oops, nothing to the
console.

The stack trace from mysql gave me:
Bogus stack limit or frame pointer, fp=0xbfabf8c0, stack_bottom=0xbfc7fcb8, thread_stack=65536, aborting backtrace.
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at (nil)  is invalid pointer
thd->thread_id=20479119

Ok. So there's a problem.

Last time I presented this, I was, IIRC, told to try the rmap patches,
but since that was almost 6 months ago in internet time, and there's
been some movement forward from both AA and Rik. I've been trying to
follow along, but (a) this stuff is pretty much over my head, and (b)
There's just too much traffic here.

The CTO here (who is effectively my boss, if not actually) wants this
solved in one, or at least two more iterations of this (Meaning I'm
allowed one more crash of this kind under certain conditions).

Personally, I'd prefer to solve this without applying any patches to the
kernel. I know that's not going to happen. Second best would be to apply
a patch that is going to be part of the mainstream "stable" kernel in
the near future.

Further more, is there anything I can do from this side to get the
kernel to publish more information when this happens?

No, let me rephrase that.

What can I do to get better debugging information from the kernel.

Some other notes on this issue:

When we last visited this, the machine was very heavily loaded, we did
some table optimizations, and cut the load to about 1/3 to 1/2 previous.
This was done mostly by minimizing the number of tables opend and
scanned. During the 96+ hours that this ran under Linux, the load rarely
got above .75, and only hit 1.8/2 once. As opposed to generally running
at a load of 1.2+ and hitting 4 regularly. Yes, I realize this doesn't
mean much, but I could go on for days with the metrics we've accumlated
on this. I can provide access to these metrics to a very few
individuals, as they are all in RRDs or Graphs.

Any more information I'm missing?

--
Share and Enjoy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Probable Memory/VM issue.

Post by Alan Co » Fri, 08 Mar 2002 07:50:07


Quote:> Bogus stack limit or frame pointer, fp=0xbfabf8c0, stack_bottom=0xbfc7fcb8, thread_stack=65536, aborting backtrace.
> Trying to get some variables.
> Some pointers may be invalid and cause the dump to abort...
> thd->query at (nil)  is invalid pointer
> thd->thread_id=20479119

Which says nothing alas - nothing about user or kernel space. If the system
had run out of memory and killed it you'd have seen "killed" and an OOM
entry logged

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

Probable Memory/VM issue.

Post by Petr » Fri, 08 Mar 2002 11:30:13



> > Bogus stack limit or frame pointer, fp=0xbfabf8c0, stack_bottom=0xbfc7fcb8, thread_stack=65536, aborting backtrace.
> > Trying to get some variables.
> > Some pointers may be invalid and cause the dump to abort...
> > thd->query at (nil)  is invalid pointer
> > thd->thread_id=20479119
> Which says nothing alas - nothing about user or kernel space. If the system
> had run out of memory and killed it you'd have seen "killed" and an OOM
> entry logged

    It definately did not run out of memory--we monitor that pretty
    close, and the memory usage was pretty constant for the 90+ hours
    prior to the crash.

--
Share and Enjoy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
 
 

1. Rik Van Riel Patch - rmap-12h - Memory Issues - VM

Original message (07/11/2002)

I installed patch rmap-12h to a 2.4.18 vanilla kernel, and we will see how
things run in about 4 days. That's how long it takes cache to fill up (4gb
of memory) and available memory to reach almost nil. This will be a perfect
box to test out your patch. We have about 4+ million images we rsync
everyday. Well we don't rsync that many a day, but rsync loads the file list
in memory. So hopefully kswapd and rsync won't reek havoc on the cpu load
when rsync tries to suck my memory dry.

With the rmap-12h patch to 2.4.18 (07/22/2002)

Rik,

After 11 days of uptime your patch is working GREAT. Although cache is
filled up again and there is only 70 megs free of memory top is reporting. I
assume that this is just normal behavior. When rsync runs, memory from buff
is free allowing rsync the memory to run seamless. And I notice that the
kswapd time that top is reporting is far less than say the 2.4.18 kernel
vanilla or 2.4.8 vanilla (two other boxes with the same configurations). Why
is this? If you have time maybe you could explain to me in layman's terms
what your patch actually does. Your patch actually saved us from having to
buy newer and faster equipment which probably wouldn't have help.

Thanks Again
Rick Parada
Systems Administrator
BizRate.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2. Builing ati drivers

3. LINUX VM (2.4.14) vs FreeBSD VM in low memory machines

4. Telnet question

5. copy_one_pmd-probable memory corruption.

6. Linux and the OS/2 Boot Manager

7. "increased VM size+Main-memory" better than "Main-memory+Hard-disk" ??

8. route add question

9. copy_to_page:probable memory coruption

10. ISSUE: vm bug? in 2.4.10

11. vm issues on sap app server

12. What version of the kernel fixes these VM issues?

13. 2.4.>=13 VM/kswapd/shmem/Oracle issue (was Re: Google's mm problems)