I still want to try dmraid-1 plus nvme drive with lvmcache.

PS SSD based lvmcache has been running well for me.


On Oct 27, 2016 5:19 PM, "Michael Butash" <mike@butash.net> wrote:

Thanks for the input, comments inline:


On 10/27/2016 02:52 PM, Joseph Sinclair wrote:
I haven't built anything on these directly, but I've encountered them on servers a little bit, and there are some specifics related to the kernel and UEFI that may help clarify the interactions for your configuration.

The way the linux F/S stack (at least currently) interacts with nvme devices is pretty good, but you do have to be a bit careful how you set it up in some cases.
To understand this, there's a diagram that's slightly out of date, but close enough at https://www.thomas-krenn.com/de/wikiDE/images/5/50/Linux-storage-stack-diagram_v4.0.svg

Let's look at your stack in this diagram, and see where things fit.

1) RAID1
  If you're using mdraid and dm-raid for the RAID layer (and not a SCSI raid driver), then you're in good shape.
  These operate in the "stackable" layer above BIO's (the Block I/O struct in the kernel), which is good, as the primary NVMe enhancements (blkmq) operate in the block layer below this.
Yep, mdraid/dm-raid is how I do all my raid1.
2) LUKS
  If you're using dm-crypt as the backend (which should be the case), then again you're in good shape.
  dm-crypt is also in the "stackable" layer, so it also benefits from blkmq enhancements.
I've always done strict block alignments with mdraid+luks+lvm, but hopefully less necessary around 4k devices these days...
3) LVM
  This always operates in the "stackable" layer, so no worries here.
I'd usually fiddled with the pv block sizing here too per recommendation - wondering how relevant that is these days or needed still.
4) boot drive
  This is where things start to get troublesome.  Some distributions don't currently support the NVMe/UEFI combination well (UEFI support itself is still a bit weak in many ways).
Never had a problem with legacy here, only time really annoyance came up with kernel update-initrd bugs that removed my ability to unlock my hard drives with keyboard input...
  Most installation guides and benchmark tests use legacy mode for compatibility, but you do give up some performance when doing so, and not all motherboards handle this configuration well.
  It's possible, in most cases, to install a NVMe drive as an UEFI boot drive (don't forget the EFI partition), but it's probably going to require some manual tweaking, and a bit of googling for *correct* instructions relevant to your chosen distribution.
I got this working with ubuntu at one point on a cranky asus that didn't have legacy mode as an option at all, so pretty sure I can hack it into working again, but rather avoid EFI all together if the nvme's play ball.
Possible Gotchas:
  1) try to avoid mapping the device through the SCSI layer.  NVMe is all about performance, and NVMe devices work best when they're mapped through the nvme driver, rather than the SCSI stack and compatibility driver.
     This shouldn't happen in most recent distributions, but legacy mode in UEFI or other issues might confuse the kernel probes.
     If you do map through the SCSI stack, you usually end up with performance much closer to SATA3 than NVMe.  You may also encounter some weird corner cases that affect stability.
Good to note.  I saw references to the nvme* devices you mention, hoping those are just bootable to the bios when using an adapter (no nvme on the mobo, need pcie adapters).
  2) If you do get the UEFI boot working, make sure there's a recovery boot device handy, for when UEFI gets confused.
     This seems to happen far more frequently than it should, particularly with legacy compatibility mode enabled.
     It seems that some UEFI M/B implementations still have a long way to go in reliability for less common setups.
Ugh, thanks for the note.  I don't ever boot windoze natively, so hopefully it doesn't get too confused.

I did have the issue of not being able to mdraid the /efi partition, all I could do was really rsync the partitions, but wondering what one *should* do for redundancy with efi data other than that...?
  3) Avoid using any kind of EFI raid support if possible.  None of these, that I've seen, appears to be well implemented, even on server-focused boards.
     Having UEFI get totally tangled when it's splitting blocks between devices and it's own code is on those devices can brick a system rather thoroughly.
I see some windoze threads and posts about using the bios raid vs. RST EFI raid.  Either smells like fakeraid voodoo, but wondering how bad it is under linux to use either.  Need to dig more here.
  4) Keep in mind that the devices will be named /dev/nvme#n# and partitions /dev/nvme#n#p#.
     Some utilities still try to use /dev/nvme or something similar, which is related to SCSI assumptions and won't work correctly.
Great, more regressions where developers forget there's anything out there but /dev/sd*.  Sort of like the old days of /dev/hd* vs /dev/sd*.
Hopefully that's at least somewhat helpful.  I hope you'll let us all know how it goes if you do end up going this route.
Yes, helpful and a good sanity check.  Thanks for the thoughtful post here.  I'm probably going to hail-mary and try them, and really hope I don't make a mistake here.  Worst case, normal sata disks are cheap enough I'll get a few drives to put /boot on, and everything else on the nvme's.
==Joseph++

On 10/27/2016 09:31 AM, Michael Butash wrote:
Curious if anyone has taken the plunge to play with nvme-based ssd's under linux here?  Particularly around raid.

Not finding a lot pertaining to them that is positive toward linux on the tubes, and I'm looking to reproduce my usual raid1+luks+lvm atop them, so feedback on doing so would be appreciated if anyone has comment.

I'm building a new desktop, and considering using them in place if they can boot, as the system is built more as a server as a vm farm for my lab so the iops would be appreciated when it's also my desktop.  I see reference to EFI-based raid, which makes me cringe, but seem mdraid fakeraid can handle them to some extents.

Thanks!

-mb



---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss@lists.phxlinux.org
To subscribe, unsubscribe, or to change your mail settings:
http://lists.phxlinux.org/mailman/listinfo/plug-discuss


---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss@lists.phxlinux.org
To subscribe, unsubscribe, or to change your mail settings:
http://lists.phxlinux.org/mailman/listinfo/plug-discuss