Adding capacity to an existing ext3-on-luks-on-LVM-on-raid1 setup


Problem statement

Despite the previous ZFS fanboying, I’m a long term fan of ext3, luks, lvm, and mdraid1.

I’ve been running ext3-on-luks-on-lvm-on-mdraid as long as I can remember. It’s a stable and convenient setup.

But eventually, one does run out of disk space (despite best efforts).

This post discusses what it takes to enlarge the existing setup; with no data loss2, and only a brief downtime3.

Background

My desktop setup runs RAID1 over SSDs. Not that I don’t have backups, I do, but the sheer inconvenience that’d result from restore after silent corruption always had me also hedge the risk somewhat by mirroring.

Furthermore, in this day and age, full disk encryption is (or at least should be) standard.

Hence the setup was (2x SSD (sda, sdb)):

original layout

Desired end-state

I decided to double the space, adding two more SSDs to the mix. As luck would have it, I had two Samsung SSD 850 PRO lying around, of the same capacity as the old Samsung SSD 830 PRO.

Since the original SSDs are getting rather long in the tooth (but still going strong!), I decided to pull one of the SSDs, replace it with the new one. And then build the second mirror also with a mix4.

Thusly:

adjusted layout

Solution

It is straightforward, but I’m sure I won’t remember it 2 weeks from now.

Also, I’m glossing over some unimportant details (like the fact I have a /boot partition on the raid, or that the naming’s a bit different).

Re-create the array

First, let’s take care of the lowest level – mdraid (software raid)5.

# Copy the partition from sda to sdc (careful!)
sfdisk -d /dev/sda | sfdisk /dev/sdc
# Fail one of the old disks
mdadm -f /dev/md1 /dev/sdb2
# Remove it from the array
mdadm -r /dev/md1 /dev/sdb2
# Add the new replica (disk)
mdadm --add /dev/md1 /dev/sdc2

# IMPORTANT:
# Wait for sync!! If it fails, you still have the failed /dev/sdb2
watch -n 1 'cat /proc/mdstat'

# Zero superblock (killing the mdraid metadata)
mdadm --zero-superblock /dev/sdb2
# Copy sdb partition to sdd (careful!)
sfdisk -d /dev/sdb | sfdisk /dev/sdd
# Create new raid
mdadm --create --verbose /dev/md2 --level=1 --raid-devices=2 /dev/sdb2 /dev/sdd2
# Again, wait for sync
watch -n 1 'cat /proc/mdstat'

There’s one thing I was unable to figure out – how to regenerate /etc/mdadm/mdadm.conf. I ended up adding the new ARRAY entry by hand. The UUID can be queried with mdadm --detail /dev/md2.

Fix the LVM layers

Next up, the LVM layers need to be tweaked. There are commands to inspect each layer (and you should do that first). I’ll only show the result at the end6.

# Add new physical volume
pvcreate /dev/md2
# Extend the `vg` group with the new volume
vgextend vg /dev/md2
# Figure out the size (take the number of blocks)
vgdisplay | grep Free.*PE
# Resize the `root` volume, adding to it all the newly available space
lvextend /dev/vg/root -l +61016

This brings the LVM to its final shape:

$ id
uid=0(root) gid=0(root) groups=0(root)
$ pvs
  PV         VG       Fmt  Attr PSize   PFree
  /dev/md1   vg       lvm2 a--  238.23g    0 
  /dev/md2   vg       lvm2 a--  238.34g    0 
$ vgs
  VG       #PV #LV #SN Attr   VSize    VFree
  vg         2   2   0 wz--n- <476.58g    0 
$ lvs
  LV     VG       Attr       LSize    [...]
  root   vg       -wi-ao---- <474.72g
  swap   vg       -wi-a-----   <1.86g

Finish the job

After LVM is adjusted, it’s time to finish the job.

Turns out, LUKS doesn’t need any resizing. So it’s a simple matter of:

# Resize ext3
resize2fs /dev/mapper/vg-root_crypt

Closing words

Honestly, even after having done this numerous times in the past… I still can’t shake off the feeling that mdraid, LVM, LUKS, ext3 are simply great.

I mean, yes, ZFS has some desireable properties (and you should consider switching), but even this “old school” stack is rock solid, understandable, and joy to use.

  1. Maybe in the future I should write down a few horror stories about hardware RAID. E.g. that time in early 2000s when we had a nice 48 hour global outage trying to rebuild RAID5 with some (gasp) critical unbacked data, only to have the controller fail and trash the second disk due to a firmware bug.

  2. Well, duh? Anyone can do it with data loss. :-D

  3. Because I’m paranoid and I tested rebooting my machine a few times throughout the process. Simply because I hate surprises.

  4. Generally, two different series or drives from two different manufacturers should decrease the likelyhood of correlated failures. Especially with dated drives.

  5. If you, gentle reader, intend to copy and paste any of these commands without modification, you are in for some rough ride. Don’t do that, mmm’kay? Especially the sfdisk one is loads of fun, if you get the drives wrong.

  6. Because frankly, I don’t have the “pre” snapshot. And I’m not great at fabrication.