On 06/19/2012 06:28 AM, Lisa Kachold wrote: > Hi Mark, > > On Mon, Jun 18, 2012 at 10:05 PM, Mark Jarvis > wrote: > > > I'm considering buying a Dell desktop (Inspiron 620), but a few > years ago I was warned off them because Dell did something different > to their disks so that you had to buy replacement/additional disks > only from Dell. Any chance that it's still true? > > Unless you have a hardware RAID card, and you are buying a desktop, you > should not have enterprise grade drives, but check with Dell Support for > the model you are interested in. > You are referring to TLER/ERC/CCTL: > > Hard drive manufacturers are drawing a distinction between "desktop" > grade and "enterprise" grade drives. The "desktop" grade drives can take > a long time (~2 minutes) to respond when they find an error, which > causes most RAID systems to label them as failed and drop them from the > array. The solution provided by the manufacturers is for us to purchase > the "enterprise" grade drives, at twice the cost, which report errors > promptly enough so that this isn't a problem. This "enterprise" feature > is called TLER, ERC, and CCTL. > > *The Problem:* > > There are three problems with this situation: > > The first is that it flies in the face of the word *Inexpensive* in the > acronym *Redundant Arrays of /Inexpensive/ Disks (RAID)* > . > > The second is that when a drive starts to fail, you want to know about > it, as Miles Nordin wrote in a long thread > : > * > Posssible Solutions:* > > For a while, Western Digital released a program (WDTLER.EXE) that made > it possible to enable TLER on desktop grade drives. This no longer works. > > *Linux:* > > This message > implies that it's impossible to tell a drive to cancel its bad read > operation: > > You can set the ERC values of your drives. Then they'll stop processing > their internal error recovery procedure after the timeout and continue > to react. Without ERC-timeout, the drive tries to correct the error on > its own (not reacting on any requests), mdraid assumes an error after a > while and tries to rewrite the "missing" sector (assembled from the > other disks). But the drive will still not react to the write request > as it is still doing its internal recovery procedure. Now mdraid > assumes the disk to be bad and kicks it. > > There's nothing you can do about this viscious circle except either > enabling ERC or using Raid-Edition disk (which have ERC enabled by default). > > Evidence that using ATA ERC commands don't always work: > Both Linux and FreeBSD can use normal desktop drives without TLER, and > in fact you *would not even want TLER* in such a case, since *TLER can > be dangerous* in some circumstances. Read on. > > > *What is TLER/CCTL/ERC?* > TLER (Time-Limited Error Recovery > CCTL (Command Completion Time Limit) > ERC (Error Recovery Control) > > These basically mean the same thing: limit the number of seconds the > harddrive spends on trying to recover a weak or bad sector. TLER and the > other variants are typically configured to 7 seconds, meaning that if > the drive has not managed to recover that sector within 7 seconds, it > will give up and forfeit recovery, and return an I/O error to the host > instead. > > The behavior without TLER is that up to 120 seconds (20-60 is more > frequent) may pass before a disk gives up recovery. This behavior causes > haywire on all Hardware RAID and Windows-based software/onboard/driver > RAIDs. The RAID consider typically is configured to consider disks that > don't respond in 10 seconds as completely failed; which is bizarre to > say the least! This smells like the vendors have some sort of deal > causing you to buy HDDs at twice the price just for a simple firmware > fix. LOL!! Don't get yourself buttraped; read on! > > > *When do i need TLER?* > You need TLER-capable disks when using any Hardware RAID or any > Windows-based software RAID; bummer if you're on Windows platform! But > this also means Hardware RAID on any OS (FreeBSD/Linux) would also need > TLER disks; even when configured to run as 'JBOD' array. There may be > controllers with different firmware that allow you to set the timeout > limit for I/O; but i've not yet heard about specific products, except > some LSI 1068E in IR mode; but reputable vendors like Areca (FW1.43) > certainly require TLER-enabled disks or they will drop-out like candy > whenever you encounter a bad/weak sector that needs longer recovery than > 10 seconds. > > Basically, if you use a RAID platform that DEMANDS the disks to respond > within 10 seconds, and will KICK OUT disks that do not respond in time, > then you need TLER. > > *When don't I need TLER?* > When using FreeBSD/Linux software RAID on a HBA controller; which is a > RAID-less controller. Areca HW RAID running in JBOD mode is still a RAID > controller; it controls whether the disks are detached, not the OS. With > a true HBA like LSI 1068E (Intel SASUC8i) your OS would have control > about whether to detach the disk or not; and Linux/BSD won't, at least > not for a simple bad sector. Not sure about Apple OSX actually, but > since it's based on FreeBSD i could speculate that it would have the > same behavior as FreeBSD; perhaps tuned differently. > > *Why don't you want TLER even if your disks are capable?* > > If you don't need TLER, then you don't want TLER! Why? Well because > *TLER is dangerous!* Nonesense? Consider this: > > 1. You have a nice RAID5 array on Hardware RAID, being a valuable > customer you spent the premium price on TLER capable disks. > 2. Now one of your disk dies; oh bummer! But hey I have RAID5; I' > protected, RIGHT? > 3. So I buy a new disk, and replace the failed one! So easy, > 4. A bad sector on of the remaining member disks, and it caused TLER to > forfeit; now I got an I/O error during rebuilding my degraded array and > the rebuild stopped and I lost access to my data! > > The danger in TLER lies that if you lost your redundancy, then if a weak > sector occurs that COULD be recovered, TLER will force the drive to STOP > TRYING after 7 seconds. If it didn't fix it by then, and you lost your > redundancy, then TLER is a harmful property instead of a useful one. > > TLER works best when you got alot of redundancy and can swap disks > easily, and want disks that show any sign of weakness - if even just a > fart - to be kicked out and replaced ASAP, without causing hickups which > are unacceptable to a heavy-duty online money transaction server, for > example. So TLER can be useful, but for consumers this is more like an > interesting way for vendors to make some more money from you poor souls! > > > *What is Bit-Error Rate and how does it relate to TLER?* > > Uncorrectable Bit-Error Rate, has been steady at 10^-14, but capacities > are growing and the BER rate stays the same. That means that modern > high-capacity harddrives now are more likely to be affected by amnesia; > they sometimes really cannot read a sector. This could be physical > damage to the sector itself, or just a weak charge meaning no physical > damage to that sector but just unreadable. > > So 2TB 512-byte sector disks have a relative high BER rate. This makes > them even more susceptible to dropping out of conventional > Windows/Hardware RAIDs, and is why the TLER feature has become more > important. But i consider it to be rather a curse than a blessing. > > *So, explain again please: Why don't I need TLER on Linux/BSD? > > * Simple: the OS does not detach a disk that times out, but resets the > interface and re-tries the I/O. Also when using ZFS, it will write to a > bad sector, causing that bad sector to be instantly > fixed/healed/corrected since writing to a bad sector makes the disk > perform a sector swap right away. In the SMART data, the "Current > Pending Sector" (active bad sector) would then become "Reallocated > Sector Count" (passive bad sector which no longer causes harm and cannot > be seen or used by the host Operating System anymore). > > *That includes ZFS?* > Yes. ZFS is, of course, the most reliable and advanced filesystem you > can use to store your files, right now. It's free, it's available, it's > hot. So use it whenever you can. > > -- Thanks Lisa. That's the best writeup I've read about this. I'll continue to steer clear of HW raid, as well as raid-5. :) -- -Eric 'shubes' --------------------------------------------------- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change your mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss