Re: OT: Dell disks

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Eric Shubert
Date:  
To: plug-discuss
Subject: Re: OT: Dell disks
On 06/19/2012 06:28 AM, Lisa Kachold wrote:
> Hi Mark,
>
> On Mon, Jun 18, 2012 at 10:05 PM, Mark Jarvis <
> <mailto:m.jarvis@cox.net>> wrote:
>
>
>     I'm considering buying a Dell desktop (Inspiron 620), but a few
>     years ago I was warned off them because Dell did something different
>     to their disks so that you had to buy replacement/additional disks
>     only from Dell. Any chance that it's still true?

>
> Unless you have a hardware RAID card, and you are buying a desktop, you
> should not have enterprise grade drives, but check with Dell Support for
> the model you are interested in.
> You are referring to TLER/ERC/CCTL:
>
> Hard drive manufacturers are drawing a distinction between "desktop"
> grade and "enterprise" grade drives. The "desktop" grade drives can take
> a long time (~2 minutes) to respond when they find an error, which
> causes most RAID systems to label them as failed and drop them from the
> array. The solution provided by the manufacturers is for us to purchase
> the "enterprise" grade drives, at twice the cost, which report errors
> promptly enough so that this isn't a problem. This "enterprise" feature
> is called TLER, ERC, and CCTL.
>
> *The Problem:*
>
> There are three problems with this situation:
>
> The first is that it flies in the face of the word *Inexpensive* in the
> acronym *Redundant Arrays of /Inexpensive/ Disks (RAID)*
> <http://www-2.cs.cmu.edu/%7Egarth/RAIDpaper/Patterson88.pdf>.
>
> The second is that when a drive starts to fail, you want to know about
> it, as Miles Nordin wrote in a long thread
> <http://opensolaris.org/jive/thread.jspa?threadID=119639&tstart=0>:
> *
> Posssible Solutions:*
>
> For a while, Western Digital released a program (WDTLER.EXE) that made
> it possible to enable TLER on desktop grade drives. This no longer works.
>
> *Linux:*
>
> This message <http://marc.info/?l=linux-raid&m=128640221813394&w=2>
> implies that it's impossible to tell a drive to cancel its bad read
> operation:
>
> You can set the ERC values of your drives. Then they'll stop processing
> their internal error recovery procedure after the timeout and continue
> to react. Without ERC-timeout, the drive tries to correct the error on
> its own (not reacting on any requests), mdraid assumes an error after a
> while and tries to rewrite the "missing" sector (assembled from the
> other disks). But the drive will still not react to the write request
> as it is still doing its internal recovery procedure. Now mdraid
> assumes the disk to be bad and kicks it.
>
> There's nothing you can do about this viscious circle except either
> enabling ERC or using Raid-Edition disk (which have ERC enabled by default).
>
> Evidence that using ATA ERC commands don't always work:
> Both Linux and FreeBSD can use normal desktop drives without TLER, and
> in fact you *would not even want TLER* in such a case, since *TLER can
> be dangerous* in some circumstances. Read on.
>
>
> *What is TLER/CCTL/ERC?*
> TLER (Time-Limited Error Recovery
> CCTL (Command Completion Time Limit)
> ERC (Error Recovery Control)
>
> These basically mean the same thing: limit the number of seconds the
> harddrive spends on trying to recover a weak or bad sector. TLER and the
> other variants are typically configured to 7 seconds, meaning that if
> the drive has not managed to recover that sector within 7 seconds, it
> will give up and forfeit recovery, and return an I/O error to the host
> instead.
>
> The behavior without TLER is that up to 120 seconds (20-60 is more
> frequent) may pass before a disk gives up recovery. This behavior causes
> haywire on all Hardware RAID and Windows-based software/onboard/driver
> RAIDs. The RAID consider typically is configured to consider disks that
> don't respond in 10 seconds as completely failed; which is bizarre to
> say the least! This smells like the vendors have some sort of deal
> causing you to buy HDDs at twice the price just for a simple firmware
> fix. LOL!! Don't get yourself buttraped; read on!
>
>
> *When do i need TLER?*
> You need TLER-capable disks when using any Hardware RAID or any
> Windows-based software RAID; bummer if you're on Windows platform! But
> this also means Hardware RAID on any OS (FreeBSD/Linux) would also need
> TLER disks; even when configured to run as 'JBOD' array. There may be
> controllers with different firmware that allow you to set the timeout
> limit for I/O; but i've not yet heard about specific products, except
> some LSI 1068E in IR mode; but reputable vendors like Areca (FW1.43)
> certainly require TLER-enabled disks or they will drop-out like candy
> whenever you encounter a bad/weak sector that needs longer recovery than
> 10 seconds.
>
> Basically, if you use a RAID platform that DEMANDS the disks to respond
> within 10 seconds, and will KICK OUT disks that do not respond in time,
> then you need TLER.
>
> *When don't I need TLER?*
> When using FreeBSD/Linux software RAID on a HBA controller; which is a
> RAID-less controller. Areca HW RAID running in JBOD mode is still a RAID
> controller; it controls whether the disks are detached, not the OS. With
> a true HBA like LSI 1068E (Intel SASUC8i) your OS would have control
> about whether to detach the disk or not; and Linux/BSD won't, at least
> not for a simple bad sector. Not sure about Apple OSX actually, but
> since it's based on FreeBSD i could speculate that it would have the
> same behavior as FreeBSD; perhaps tuned differently.
>
> *Why don't you want TLER even if your disks are capable?*
>
> If you don't need TLER, then you don't want TLER! Why? Well because
> *TLER is dangerous!* Nonesense? Consider this:
>
> 1. You have a nice RAID5 array on Hardware RAID, being a valuable
> customer you spent the premium price on TLER capable disks.
> 2. Now one of your disk dies; oh bummer! But hey I have RAID5; I'
> protected, RIGHT?
> 3. So I buy a new disk, and replace the failed one! So easy,
> 4. A bad sector on of the remaining member disks, and it caused TLER to
> forfeit; now I got an I/O error during rebuilding my degraded array and
> the rebuild stopped and I lost access to my data!
>
> The danger in TLER lies that if you lost your redundancy, then if a weak
> sector occurs that COULD be recovered, TLER will force the drive to STOP
> TRYING after 7 seconds. If it didn't fix it by then, and you lost your
> redundancy, then TLER is a harmful property instead of a useful one.
>
> TLER works best when you got alot of redundancy and can swap disks
> easily, and want disks that show any sign of weakness - if even just a
> fart - to be kicked out and replaced ASAP, without causing hickups which
> are unacceptable to a heavy-duty online money transaction server, for
> example. So TLER can be useful, but for consumers this is more like an
> interesting way for vendors to make some more money from you poor souls!
>
>
> *What is Bit-Error Rate and how does it relate to TLER?*
>
> Uncorrectable Bit-Error Rate, has been steady at 10^-14, but capacities
> are growing and the BER rate stays the same. That means that modern
> high-capacity harddrives now are more likely to be affected by amnesia;
> they sometimes really cannot read a sector. This could be physical
> damage to the sector itself, or just a weak charge meaning no physical
> damage to that sector but just unreadable.
>
> So 2TB 512-byte sector disks have a relative high BER rate. This makes
> them even more susceptible to dropping out of conventional
> Windows/Hardware RAIDs, and is why the TLER feature has become more
> important. But i consider it to be rather a curse than a blessing.
>
> *So, explain again please: Why don't I need TLER on Linux/BSD?
>
> * Simple: the OS does not detach a disk that times out, but resets the
> interface and re-tries the I/O. Also when using ZFS, it will write to a
> bad sector, causing that bad sector to be instantly
> fixed/healed/corrected since writing to a bad sector makes the disk
> perform a sector swap right away. In the SMART data, the "Current
> Pending Sector" (active bad sector) would then become "Reallocated
> Sector Count" (passive bad sector which no longer causes harm and cannot
> be seen or used by the host Operating System anymore).
>
> *That includes ZFS?*
> Yes. ZFS is, of course, the most reliable and advanced filesystem you
> can use to store your files, right now. It's free, it's available, it's
> hot. So use it whenever you can.
>
> --


Thanks Lisa. That's the best writeup I've read about this.

I'll continue to steer clear of HW raid, as well as raid-5. :)

--
-Eric 'shubes'

---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss