Samsung SSDs - Am at the end of life?

Carruth, Rusty Rusty.Carruth at smartm.com
Mon Jan 30 09:54:04 MST 2017


Beware - first you must ensure that the tool which is decoding the S.M.A.R.T. attributes 'knows' the encoding that the drive manufacturer is using.

First, a disclaimer: I work for a company that makes SSDs.  (Smart High Reliability Solutions)  Nothing I say here is endorsed by the company.  On the other hand, nothing I say here is a secret.

So, that being said: Let me digress, and partially answer one of the initial questions:

The S.M.A.R.T. attributes are both a standard and a non-standard.  (HUH? Just hold on, I'll explain)

They are a standard in that there is a standard for how to package up the attributes for sending to the host.  (see ATA8, to be precise).

However WHAT each attribute MEANS, and how the various values inside the 'standard packet' are decoded - well, that is vendor-specific.

In fact, the STRINGS people use to identify an attribute is actually a vendor-specific string for the numeric attribute ID which is returned by the drive.  That nifty string 'POWER_ON_TIME' or whatever does NOT come from the drive, it comes from a program guessing what the vendor means by that numeric attribute ID.  In the case of smartctl, people will either guess what the mapping is, or they'll read the drive manufacturer's documentation, or maybe even the drive manufacturer will create the mapping and send it to the smartctl developers (we did that for one of our drives that wasn't in the smartctl database).

Now, once you know the 'name' of the smart attribute, you STILL don't know how the defined fields in that attribute are encoded!

So, for example, almost every drive vendor will use attribute 7 (IIRC) to be 'power on time'.  But is the value minutes, seconds, milliseconds, hours plus some kind of minute count, or what?  There are at least 3 (and I think more, but right offhand I don't remember how many) 'standard' ways to encode power on time.  (Where here 'standard' means 'industry standard', which means if enough vendors use 7 to mean 'power on time' (EVEN if the encoding of the time value is different!), then 7 is a de-facto industry standard', which you might infer really doesn't mean a whole lot when it comes to decoding the values the drive returns)

Ok, all that being said, that byte that someone mentioned (percent life left?) SHOULD be a valid indication of remaining life.  (but read the article pointed out by Matthew for some caveats).  Assuming that it really reflects what its supposed to.

Also, on the issue of the article pointed to by Matthew, BEWARE!  Halfway through the article I noticed that the entire test was with sequential writes (or that's what he seemed to be saying).

Normal systems don't do mostly sequential writes.  They usually do either 512-byte random writes, or 4k-byte random writes.  

And to get those 'small' blocks written by the host into the MUCH larger blocks (actually 'pages') that the flash has can require anywhere from 2 to 10 (or more!) writes to the flash for EACH write from the host!

This is called 'write amplification'.  So do NOT expect your SSD on your PC to survive anywhere near the lifetime that the article gives for sequential writes...

In any case, 'back up early, back up often' - regardless of your media - SSD, rotator, tape, core, sand ;-)

As an aside, everyone who has EVER used smartctl (even indirectly) should be thankful to the smartctl developers (and all those (probably) un-named people who figured out the mapping from S.M.A.R.T. attribute NUMBER to a fully-decoded human-readable (for some value of readable) output)!


Now, decoding your smartctl dump (NOTE!  I am not a smartctl decoder.  I don't even play one on the internet!):

First, that line in the output that says that this drive is not in the smartctl database, means that the decoding was done by the PROGRAM GUESSING as to what the attribute IDs map to, and the decoding of each of the values inside each attribute.

Pause a moment and think about that ;-)

Anyway, having just said (in effect) that there can be no absolute certainty that the numbers are decoded correctly, we'll pretend that they can and rush in where angels know better...

The total LBAs written is 15,931,263,658, which is almost 8 terabytes.  According to the article below, that shouldn't be a cause for worry.  (But again, they didn't use 4k random writes, as far as I could tell, so they could be off as much as 10 x or more).

The uncorrect and reallocated values also look good.

I've got to get to work.  If nobody answers your question about the 'pre-fail' and such I'll try to say something within a week...

Rusty

-----Original Message-----
From: PLUG-discuss [mailto:plug-discuss-bounces at lists.phxlinux.org] On Behalf Of Matthew Crews
Sent: Sunday, January 29, 2017 1:23 PM
To: Main PLUG discussion list
Subject: RE: Samsung SSDs - Am at the end of life?

You want to ignore the “Raw Value” column outright, and instead look at the “Value” column. That is showing you still have a relatively healthy SSD.

Keep in mind that some form of reallocation and wear leveling is normal, and to be expected. When this value gets very low, you can start worrying about a replacement.

Hope that helps.

Also, this reminds me of an extreme SSD endurance test that Techreport did a couple years ago. Enjoy the read:
http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead

Matt
---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss at lists.phxlinux.org To subscribe, unsubscribe, or to change your mail settings:
http://lists.phxlinux.org/mailman/listinfo/plug-discuss


More information about the PLUG-discuss mailing list