cause of IO wait

Lisa Kachold lisakachold at obnosis.com
Mon Jun 27 21:27:41 MST 2011


Hi Hans:

On Mon, Jun 27, 2011 at 5:07 PM, der.hans <PLUGd at lufthans.com> wrote:
> moin moin,
>
> I've got a machine experiencing a lot of IO wait.
>
> We had power at a datacenter go down last week. Since then IO wait has
> been over 35%. At first we thought it was due to 3ware RAID verify taking
> place due to the crash. That took a few days, then the weekly verify
> started. We stopped that and IO wait stayed high. 8 disks in a RAID 10.
>
> Load avg is also very high, presumably due to the IO wait.
>
> smartctl short tests didn't turn up any issues.
>
> We're not swapping at all.
>
> Disk read and write are fairly low.
>
> Network traffic is down as is the total number of process and the number
> of running processes. No evidence of network errors on the box or at the
> switch.
>
> Not much going on in the logs. We've stopped several reporting processes
> in order to reduce disk access.
>
> On the positive side, entropy has been staying high :).
>
> IO wait is not explicitly disk? It could be network, serial, USB, etc.?
>
> How do I determine what resource is causing the IO wait? Is there a way to
> track to a specific process?
>
> vmstat, iostat, top and lots of other tools have been great at showing
> that there's overall IO wait ( I've been able to show that almost all
> processors have high wait, one was only at 5% ), but I haven't yet
> determined what and how.

What version is your 3ware firmware?  That's fairly important, you realize?

> The server is running CentOS in case that matters.

Please see this link related to known kernel bug in rhel kernel for
3ware products:
https://bugzilla.redhat.com/show_bug.cgi?id=121434
It also discusses troubleshooting commands to verify, some kernel proc
tuning and resolutions that worked for some.

I don't see where your kernel or distro version is listed?  CentOs in
a 2.4 kernel?  CentOs 5.6?

There are many suggestions that will give you a place to start:

For instance, try reducing the queue depth of the 3Ware driver:

can_queue from 254 to 30
command_per_lun from 254 to 4

There is a good deal of material in this post that will give you some
ideas on how to do high performance kernel tuning and troubleshooting.

But first, I would search using your firmware version and kernel
version/distro to get all the known issues in preparation for
UPGRADING.  You certainly can't expect CURRENT performance without
kernel sources?
> ciao,
>
> der.hans
> --
> #  http://www.LuftHans.com/        http://www.LuftHans.com/Classes/
> #  Hope has two beautiful daughters: Anger and Courage. Anger at the way
> #  things are, and Courage to struggle to create things as they should be.
> #  -- St. Augustine
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss at lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>



-- 
(602) 791-8002  Android
(623) 239-3392 Skype
(623) 688-3392 Google Voice

HomeSmartInternational.com


More information about the PLUG-discuss mailing list