rapidscaleclusters.com

Resolved: kernel panic when creating point in time mysql database snapshot

At some point in the Centos / RHEL kernel-2.6.18-164.x kernel tree, a bug was introduced that caused an lvm snapshot of a root volume to panic the kernel ( see LVM discussion list post here ) . As a result, anyone who snapshotted a database that was not on a separate logical volume immediately hung their server (while trying to back it up in the name high availability - oh, the irony). Actually, it works once, then hangs the server every time after that.

However, this bug has been examined and resolved as of the kernel-2.6.18-178 tree. If this issue affects you, there are a few things you can do while waiting for the newer kernel version to appear in your favorite yum repository. The latest test kernel RPMs can usually be downloaded here, courtesy of John Linville of Redhat:

http://people.redhat.com/linville/kernels/rhel5/

And all of these should have the patch installed, plus anything else that has recently has fixed or added.

Alternatively, we provide a working kernel and the corresponding kernel-devel package on our site as well:

http://www.linuxwebcluster.com/download/kernel-2.6.18-182.el5.i686.rpm

http://www.linuxwebcluster.com/download/kernel-devel-2.6.18-182.el5.i686.rpm

Just download and install on RHEL / Centos 5.x with rpm. After installation a quick reboot should load up the new kernel and your snapshots will work again just fine.

 


 

Here's the original kernel oops and LVM-discuss post so google searchers can hopefully find this information:

BUG: scheduling while atomic: java/0x00000001/2959
[<c061637f>] <3>BUG: scheduling while atomic: java/0x00000001/2867
[<c061637f>] schedule+0x43/0xa55
[<c042c40d>] lock_timer_base+0x15/0x2f
[<c042c46b>] try_to_del_timer_sync+0x44/0x4a
[<c0437dd2>] futex_wake+0x3c/0xa5
[<c0434d5f>] prepare_to_wait+0x24/0x46
[<c0461ea7>] do_wp_page+0x1b3/0x5bb
[<c0438b01>] do_futex+0x239/0xb5e
[<c0434c13>] autoremove_wake_function+0x0/0x2d
[<c0463876>] __handle_mm_fault+0x9a9/0xa15
[<c041e727>] default_wake_function+0x0/0xc
[<c046548d>] unmap_region+0xe1/0xf0
[<c061954f>] do_page_fault+0x233/0x4e1
[<c061931c>] do_page_fault+0x0/0x4e1
[<c0405a89>] error_code+0x39/0x40
=======================
schedule+0x43/0xa55
[<c042c40d>] <0>------------[ cut here ]------------
kernel BUG at arch/i386/mm/highmem.c:43!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
ip6t_REJECTdCPU: 3 ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo cry
EIP: 0060:[<c041cb08>] Not tainted VLI
EFLAGS: 00010206 (2.6.18-164.2.1.el5 #1)
EIP is at kmap_atomic+0x5c/0x7f
eax: c0012d6c ebx: fff5b000 ecx: c1fb8760 edx: 00000180
esi: f7be8580 edi: f7fa7000 ebp: 00000004 esp: f5c54f0c
ds: 007b es: 007b ss: 0068
Process mpath_wait (pid: 3273, ti=f5c54000 task=f5c50000 task.ti=f5c54000)ne
Stack: c073a4e0 c0462f7f f7b0eb30 f7b40780 f5c54f3c 0029c3f0 f63b5ef0 f7be8580
f7b40780 f7fa7000 00008802 c0472d75 f7b0eb30 f7c299c0 00001000 00001000
00001000 00000101 00000001 00000000 00000000 f5c5007b 0000007b ffffffff
Call Trace:
[<c0462f7f>] __handle_mm_fault+0xb2/0xa15
[<c0472d75>] do_filp_open+0x2b/0x31
[<c061954f>] do_page_fault+0x233/0x4e1
[<c061931c>] do_page_fault+0x0/0x4e1
[<c0405a89>] error_code+0x39/0x40
=======================
Code: 00 89 e0 25 00 f0 ff ff 6b 50 10 1b 8d 14 13 bb 00 f0 ff ff 8d 42 44 c1 e
EIP: [<c041cb08>] kmap_atomic+0x5c/0x7f SS:ESP 0068:f5c54f0c
<0>Kernel panic - not syncing: Fatal exception

0c 29 c3 a1 54 12 79 c0 c1 e2 02 29 d0 83 38 00 74 08 <0f> 0b 2b

 

Add your comment

Your name:
Subject:
Comment:

Subscribe to the Linux Admins Blog and get new posts delivered by email!
Enter your email address:

Delivered by FeedBurner


Tell the developers:

The type of clustering you are most likely to deploy is:
 
What Linux distro do you use for clusters?
 

Copyright 2010    RapidScale Clusters, LLC