server load, litespeed, scsi, nameserver fail: all messed up

Rawkon asked:

First of all: I have very basic knowledge about linux servers.

Here is our dedicated server details:

Intel(R) Xeon(R) CPU X3450 @ 2.67GHz
Total processor: 8

OS: Centos 32bit release 5.8 (Final)

MySQL version: 5.1.66

PHP version: 5

total                used      free     shared   buffers    cached
Mem:                16413196  16253176  160020      0        79892   12549592
-/+ buffers/cache:  3623692   12789504
Swap:               16386292     0      16386292

Filesystem            Size  Used Avail
/dev/sda1             211G   72G  129G

In the last 2 days we have a problem with litespeed. Customers call us in the morning and tell that your website is down. When I check process list litespeed runs like a crazy, its cpu usage goes %30 – %40 and PHP processes can’t run means downtime for our website. So, I reboot (forceful) the server, it comes back after 1-2 minutes, PHP and litespeed works fine.

After my deep examination on WHM, I realized these things…

First, I got this e-mail always after I reboot the server:

FAILED: nameserver on ns1.xxx.com (77.223.xxx.xx)
nameserver failed @ Fri Dec 28 11:45:59 2012. A restart was attempted automagically.

RECOVERED: nameserver on ns1.xxx.com (77.223.xxx.xx)
nameserver check was successful after restart.

Second, IO Statistics doesn’t seem okay to me:

Sec     Blocks    Read/sec    Blocks    Written/Sec
sda     140.51    16327.73    1071.14    55574210   

Third, I got this error from the daily logwatch:

 WARNING:  Kernel Errors Present
    scsi 10:0:0:0: scsi: Device offlined - not ready after error recovery ...:  1 Time(s)
    scsi 11:0:0:0: scsi: Device offlined - not ready after error recovery ...:  1 Time(s)
    scsi 12:0:0:0: scsi: Device offlined - not ready after error recovery ...:  1 Time(s)
    scsi 13:0:0:0: scsi: Device offlined - not ready after error recovery ...:  1 Time(s)
    scsi 14:0:0:0: scsi: Device offlined - not ready after error recovery ...:  1 Time(s)
    scsi 15:0:0:0: scsi: Device offlined - not ready after error recovery ...:  1 Time(s)
    scsi 16:0:0:0: scsi: Device offlined - not ready after error recovery ...:  1 Time(s)
    scsi 17:0:0:0: scsi: Device offlined - not ready after error recovery ...:  1 Time(s)
    scsi 18:0:0:0: scsi: Device offlined - not ready after error recovery ...:  1 Time(s)
    scsi 19:0:0:0: scsi: Device offlined - not ready after error recovery ...:  1 Time(s)

(and it goes like this)

Finally, the current top list:

top - 12:46:40 up  1:02,  1 user,  load average: 1.26, 1.53, 1.69
Tasks: 165 total,   1 running, 164 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.7%us,  0.2%sy,  0.0%ni, 93.6%id,  3.8%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:  16413196k total, 16297256k used,   115940k free,    24196k buffers
Swap: 16386292k total,      132k used, 16386160k free, 12577056k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3280 mysql     15   0 10.0g 3.2g 4612 S 13.3 20.2  50:02.38 mysqld
 7318 ouruser   15   0  144m  12m 7088 S  1.0  0.1   0:00.03 lsphp5
 5137 nobody     0 -19 27628 8000  488 S  0.3  0.0   0:07.53 litespeed
 7306 ouruser   15   0  145m  13m 7120 S  0.3  0.1   0:00.29 lsphp5
    1 root      15   0 10368  700  584 S  0.0  0.0   0:01.08 init
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
    5 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/1
    6 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/1
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1
    8 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/2
    9 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/2
   10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/2
   11 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/3
   12 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/3
   13 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/3
   14 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/4
   15 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/4
   16 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/4
   17 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/5
   18 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/5
   19 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/5
   20 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/6
   21 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/6
   22 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/6
   23 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/7
   24 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/7
   25 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/7
   26 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/0
   27 root      10  -5     0    0    0 S  0.0  0.0   0:00.02 events/1
   28 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/2
   29 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/3
   30 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/4
   31 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/5
   32 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/6
   33 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/7
   34 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 khelper
  179 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread
  190 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kblockd/0
  191 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kblockd/1
  192 root      10  -5     0    0    0 S  0.0  0.0   0:00.01 kblockd/2
  193 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kblockd/3
  194 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kblockd/4
  195 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kblockd/5
  196 root      10  -5     0    0    0 S  0.0  0.0   0:00.02 kblockd/6
  197 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kblockd/7
  198 root      10  -5     0    0    0 S  0.0  0.0   0:00.96 kacpid

And the “dmesg” status:

input: PC Speaker as /class/input/input0
EDAC MC: Ver: 2.0.1 Apr 12 2011
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
EDAC MC0: Giving out device to i7core_edac.c i7 core #0: DEV 0000:ff:03.0
EDAC i7core: Driver loaded.
e1000e: Intel(R) PRO/1000 Network Driver - 1.2.7-k2
e1000e: Copyright (c) 1999 - 2010 Intel Corporation.
GSI 20 sharing vector 0x42 and IRQ 20
ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 17 (level, low) -> IRQ 66
PCI: Setting latency timer of device 0000:00:19.0 to 64
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 1:0:0:0: Attached scsi generic sg1 type 0
scsi 4:0:0:0: Attached scsi generic sg2 type 5
eth0: (PCI Express:2.5GB/s:Width x1) 00:22:4d:37:6e:fa
eth0: Intel(R) PRO/1000 Network Connection
eth0: MAC: 9, PHY: 9, PBA No: 313130-032
GSI 21 sharing vector 0x52 and IRQ 21
ACPI: PCI Interrupt 0000:06:00.0[A] -> GSI 18 (level, low) -> IRQ 82
PCI: Setting latency timer of device 0000:06:00.0 to 64
e1000e 0000:06:00.0: Disabling ASPM L0s
eth1: (PCI Express:2.5GB/s:Width x1) 00:22:4d:37:6e:f9
eth1: Intel(R) PRO/1000 Network Connection
eth1: MAC: 3, PHY: 8, PBA No: 313132-030
ACPI: PCI Interrupt 0000:00:1f.3[D] -> GSI 20 (level, low) -> IRQ 233
sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 4:0:0:0: Attached scsi CD-ROM sr0
floppy0: no floppy controllers found
work still pending
lp: driver loaded but no devices found
ACPI: AC Adapter [PSD] (on-line)
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ACPI: Mapper loaded
dell-wmi: No known WMI GUID found
 power_meter0: Found ACPI power meter.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
device-mapper: multipath: version 1.0.6 loaded
EXT3 FS on sda1, internal journal
Adding 16386292k swap on /dev/sda2.  Priority:-1 extents:1 across:16386292k
IA-32 Microcode Update Driver: v1.14a <tigran@veritas.com>
loop: loaded (max 8 devices)
kjournald starting.  Commit interval 5 seconds
EXT3-fs warning: checktime reached, running e2fsck is recommended
EXT3 FS on loop0, internal journal
ext3_orphan_cleanup: deleting unreferenced inode 96
ext3_orphan_cleanup: deleting unreferenced inode 22
ext3_orphan_cleanup: deleting unreferenced inode 17
ext3_orphan_cleanup: deleting unreferenced inode 16
ext3_orphan_cleanup: deleting unreferenced inode 15
ext3_orphan_cleanup: deleting unreferenced inode 14
ext3_orphan_cleanup: deleting unreferenced inode 13
EXT3-fs: loop0: 7 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
ip_conntrack version 2.4 (8192 buckets, 65536 max) - 304 bytes per conntrack
e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
eth0: 10/100 speed: disabling TSO
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present
usb 2-1.5: new full speed USB device using ehci_hcd and address 3
usb 2-1.5: configuration #1 chosen from 1 choice
input: Fujitsu iRMC USB Device as /class/input/input1
input: USB HID v1.11 Keyboard [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
input: Fujitsu iRMC USB Device as /class/input/input2
input: USB HID v1.11 Mouse [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
scsi6 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 3
usb-storage: waiting for device to settle before scanning
usb 2-1.6: new high speed USB device using ehci_hcd and address 4
usb 2-1.6: configuration #1 chosen from 1 choice
scsi7 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 4
usb-storage: waiting for device to settle before scanning
 power_meter0: Capping in progress.
 power_meter0: Found ACPI power meter.
usb 2-1.5: reset full speed USB device using ehci_hcd and address 3
usb 2-1.5: device firmware changed
scsi 6:0:0:0: scsi: Device offlined - not ready after error recovery
usb 2-1.5: USB disconnect, address 3
usb-storage: device scan complete
usb 2-1.5: new full speed USB device using ehci_hcd and address 5
usb 2-1.5: configuration #1 chosen from 1 choice
input: Fujitsu iRMC USB Device as /class/input/input3
input: USB HID v1.11 Keyboard [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
input: Fujitsu iRMC USB Device as /class/input/input4
input: USB HID v1.11 Mouse [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
usb 2-1.6: reset high speed USB device using ehci_hcd and address 4
usb 2-1.6: device firmware changed
scsi 7:0:0:0: scsi: Device offlined - not ready after error recovery
usb 2-1.6: USB disconnect, address 4
usb-storage: device scan complete
usb 2-1.6: new high speed USB device using ehci_hcd and address 6
usb 2-1.6: config 1 has no interfaces?
usb 2-1.6: configuration #1 chosen from 1 choice
 power_meter0: Found ACPI power meter.
usb 2-1.5: USB disconnect, address 5
usb 2-1.5: new full speed USB device using ehci_hcd and address 7
usb 2-1.5: configuration #1 chosen from 1 choice
input: Fujitsu iRMC USB Device as /class/input/input5
input: USB HID v1.11 Keyboard [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
input: Fujitsu iRMC USB Device as /class/input/input6
input: USB HID v1.11 Mouse [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
scsi8 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 7
usb-storage: waiting for device to settle before scanning
usb 2-1.6: USB disconnect, address 6
usb 2-1.6: new high speed USB device using ehci_hcd and address 8
usb 2-1.6: configuration #1 chosen from 1 choice
scsi9 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 8
usb-storage: waiting for device to settle before scanning
 power_meter0: Capping in progress.
 power_meter0: Found ACPI power meter.
usb 2-1.5: reset full speed USB device using ehci_hcd and address 7
usb 2-1.5: device firmware changed
scsi 8:0:0:0: scsi: Device offlined - not ready after error recovery
usb 2-1.5: USB disconnect, address 7
usb-storage: device scan complete
usb 2-1.5: new full speed USB device using ehci_hcd and address 9
usb 2-1.5: configuration #1 chosen from 1 choice
input: Fujitsu iRMC USB Device as /class/input/input7
input: USB HID v1.11 Keyboard [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
input: Fujitsu iRMC USB Device as /class/input/input8
input: USB HID v1.11 Mouse [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
usb 2-1.6: reset high speed USB device using ehci_hcd and address 8
usb 2-1.6: device firmware changed
scsi 9:0:0:0: scsi: Device offlined - not ready after error recovery
usb 2-1.6: USB disconnect, address 8
usb-storage: device scan complete
usb 2-1.6: new high speed USB device using ehci_hcd and address 10
usb 2-1.6: config 1 has no interfaces?
usb 2-1.6: configuration #1 chosen from 1 choice
 power_meter0: Found ACPI power meter.
usb 2-1.5: USB disconnect, address 9
usb 2-1.5: new full speed USB device using ehci_hcd and address 11
usb 2-1.5: configuration #1 chosen from 1 choice
input: Fujitsu iRMC USB Device as /class/input/input9
input: USB HID v1.11 Keyboard [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
input: Fujitsu iRMC USB Device as /class/input/input10
input: USB HID v1.11 Mouse [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
scsi10 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 11
usb-storage: waiting for device to settle before scanning
usb 2-1.6: USB disconnect, address 10
usb 2-1.6: new high speed USB device using ehci_hcd and address 12
usb 2-1.6: configuration #1 chosen from 1 choice
scsi11 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 12
usb-storage: waiting for device to settle before scanning
 power_meter0: Capping in progress.
 power_meter0: Found ACPI power meter.
usb 2-1.5: reset full speed USB device using ehci_hcd and address 11
usb 2-1.5: device firmware changed
scsi 10:0:0:0: scsi: Device offlined - not ready after error recovery
usb 2-1.5: USB disconnect, address 11
usb-storage: device scan complete
usb 2-1.5: new full speed USB device using ehci_hcd and address 13
usb 2-1.5: configuration #1 chosen from 1 choice
input: Fujitsu iRMC USB Device as /class/input/input11
input: USB HID v1.11 Keyboard [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
input: Fujitsu iRMC USB Device as /class/input/input12
input: USB HID v1.11 Mouse [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
usb 2-1.6: reset high speed USB device using ehci_hcd and address 12
usb 2-1.6: device firmware changed
scsi 11:0:0:0: scsi: Device offlined - not ready after error recovery
usb 2-1.6: USB disconnect, address 12
usb-storage: device scan complete
usb 2-1.6: new high speed USB device using ehci_hcd and address 14
usb 2-1.6: config 1 has no interfaces?
usb 2-1.6: configuration #1 chosen from 1 choice
 power_meter0: Found ACPI power meter.
usb 2-1.5: USB disconnect, address 13
usb 2-1.5: new full speed USB device using ehci_hcd and address 15
usb 2-1.5: configuration #1 chosen from 1 choice
input: Fujitsu iRMC USB Device as /class/input/input13
input: USB HID v1.11 Keyboard [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
input: Fujitsu iRMC USB Device as /class/input/input14
input: USB HID v1.11 Mouse [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
scsi12 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 15
usb-storage: waiting for device to settle before scanning
usb 2-1.6: USB disconnect, address 14
usb 2-1.6: new high speed USB device using ehci_hcd and address 16
usb 2-1.6: configuration #1 chosen from 1 choice
scsi13 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 16
usb-storage: waiting for device to settle before scanning
 power_meter0: Capping in progress.
 power_meter0: Found ACPI power meter.
usb 2-1.5: reset full speed USB device using ehci_hcd and address 15
usb 2-1.5: device firmware changed
scsi 12:0:0:0: scsi: Device offlined - not ready after error recovery
usb 2-1.5: USB disconnect, address 15
usb-storage: device scan complete
usb 2-1.5: new full speed USB device using ehci_hcd and address 17
usb 2-1.5: configuration #1 chosen from 1 choice
input: Fujitsu iRMC USB Device as /class/input/input15
input: USB HID v1.11 Keyboard [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
input: Fujitsu iRMC USB Device as /class/input/input16
input: USB HID v1.11 Mouse [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
usb 2-1.6: reset high speed USB device using ehci_hcd and address 16
usb 2-1.6: device firmware changed
scsi 13:0:0:0: scsi: Device offlined - not ready after error recovery
usb 2-1.6: USB disconnect, address 16
usb-storage: device scan complete
usb 2-1.6: new high speed USB device using ehci_hcd and address 18
usb 2-1.6: config 1 has no interfaces?
usb 2-1.6: configuration #1 chosen from 1 choice
 power_meter0: Found ACPI power meter.
usb 2-1.5: USB disconnect, address 17
usb 2-1.5: new full speed USB device using ehci_hcd and address 19
usb 2-1.5: configuration #1 chosen from 1 choice
input: Fujitsu iRMC USB Device as /class/input/input17
input: USB HID v1.11 Keyboard [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
input: Fujitsu iRMC USB Device as /class/input/input18
input: USB HID v1.11 Mouse [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
scsi14 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 19
usb-storage: waiting for device to settle before scanning
usb 2-1.6: USB disconnect, address 18
usb 2-1.6: new high speed USB device using ehci_hcd and address 20
usb 2-1.6: configuration #1 chosen from 1 choice
scsi15 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 20
usb-storage: waiting for device to settle before scanning
 power_meter0: Capping in progress.
 power_meter0: Found ACPI power meter.
usb 2-1.5: reset full speed USB device using ehci_hcd and address 19
usb 2-1.5: device firmware changed
scsi 14:0:0:0: scsi: Device offlined - not ready after error recovery
usb 2-1.5: USB disconnect, address 19
usb-storage: device scan complete
usb 2-1.5: new full speed USB device using ehci_hcd and address 21
usb 2-1.5: configuration #1 chosen from 1 choice
input: Fujitsu iRMC USB Device as /class/input/input19
input: USB HID v1.11 Keyboard [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
input: Fujitsu iRMC USB Device as /class/input/input20
input: USB HID v1.11 Mouse [Fujitsu iRMC USB Device] on usb-0000:00:1d.0-1.5
usb 2-1.6: reset high speed USB device using ehci_hcd and address 20
usb 2-1.6: device firmware changed
scsi 15:0:0:0: scsi: Device offlined - not ready after error recovery
usb 2-1.6: USB disconnect, address 20
usb-storage: device scan complete
usb 2-1.6: new high speed USB device using ehci_hcd and address 22
usb 2-1.6: config 1 has no interfaces?
usb 2-1.6: configuration #1 chosen from 1 choice
 power_meter0: Found ACPI power meter.

My answer:


Looks like the server’s iRMC (the piece of hardware that supplies out-of-band keyboard, video, mouse, virtual CDROM, IPMI management, etc.) is malfunctioning. Replace it.


View the full question and answer on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.