Occasional blog posts from a random systems engineer

Goodbye SSDs, long live ZFS: Homelab storage failure

· Read in about 5 min · (872 Words)

Homelab storage

Over the past 20 years, my homelab has gone through a huge variety of iterations and setups, roughly in this order:

  • Random desktop machine with a hard drive
  • First actual servers with local hard drives
  • Home-built NAS with iSCSI for VMs (ESX, then later oVirt)
  • Random array of laptops with iPXE and NFS root drives (running out of budget!)
  • Move to a real datacenter with iSCSI to NEXSAN SATABOY!
  • Back home to iSCSI on a Synology NAS

Then, the latest upgrade about six years ago:

  • 11 SSDs – 8 in software RAID for VMs, 2 for boot, and 1 for a build VM

I kept the Synology for larger VMs, but for smaller workloads, this 4TB SSD array was blazingly fast.

Uh oh

But… of all days, Jan 2nd, I noticed an SSD had failed. If you’re used to hard drive failures, this might not seem alarming. But if you’ve worked with SSDs (particularly consumer SSDs), this meant bad news.

Since the RAID distributes writes evenly across all drives (the main killer for SSDs), it was only a matter of time before more failed. Luckily, I had purchased two spares when I originally built the array, so I replaced the failed drive and all seemed fine… until a week later, when a second drive failed. No spares left.

Moving forward

Of course, this happened in 2026—the era where SATA SSDs are nearly extinct. I literally could not find any on Amazon, and the few higher-capacity options were very expensive.

Replacing the array with 2TB SSDs would cost ~£500 (the original array had been £400). Not ideal—especially with only three drives, which meant:

  • Less redundancy
  • Higher cost per failure

And realistically, these would still be consumer drives — not designed for this workload (though surviving six years wasn’t bad).

Spinning rust

I considered the performance trade-off if I returned to spinning disks. Comparing consumer SSDs against high-end 2.5" 15K SAS drives:

Crucial MX500 SSD Dell 15K 2.5" SAS
IOPs ~95K ~150–210
Throughput (write) ~510 MB/s ~115 MB/s
Throughput (read) ~560 MB/s ~115 MB/s
Latency <1ms ~2–3ms

No surprises here—SSDs clearly outperform.

Reflecting on running 50+ VMs on spinning disk, I remember bursts of high load averages (>300!) and times when read/write barely reached 1 MB/s. I’d gotten used to SSD speed, and it was hard to go back.

But, at the same time, 10 x 1.2TB SAS drives for £120 is beyond cheap.

Getting the drives working

This should have been trivial, but after inserting a single drive to test, it wasn’t detected by the OS.

After some investigation (the server isn’t easily accessible, and I didn’t want downtime while SSDs were active), I realized the hard drive frontplane was SSD-only. After buying a replacement front panel and cable (£30), I logged into the ILO (network unplugged) and found the drive listed as a foreign RAID member, requiring initialization :cry:

Cache without cash

I considered maximizing caching.

Read cache was easy—all remaining healthy SSDs could be used. 2 x 500GB for read caching was a no-brainer.

Write cache, however, raised questions around failure, corruption, and data loss. I set out what mattered and what didn’t:

  • Loss of filesystem – disastrous (recoverable from backups, but very inconvenient)
  • Outage due to drive failure – fine, as long as restart is possible
  • Non-corruptable data loss – acceptable if the failure was consistent and could be recovered to a safe state

With this approach, two nearly-new SSDs could handle write caching safely.

ZFS

Previously, my stack was:

SSDs -> md -> LVM PV -> VG -> LV (per VM) -> Exposed to qemu as VM disk

I’d used per-LV caching in LVM, but it was brittle and tedious at scale.

With ZFS:

  • Read cache: L2ARC – assign two SSDs, done
  • Write cache: SLOG – assign two SSDs, done

Data transfer

I reinstalled a spare server with 24TB storage to act as a temporary backup, incase the SSDs died during the migration process (still had to buy drives etc.):

  • 1 LV for the PV of the SSD pool
  • 1 LV for RootFS backup

Copying block devices over the network:

dd if=/dev/md0 bs=1M | pv | ssh user@temp-r720 "dd of=/dev/backup_vg/tmp_backup bs=1M"

Verification:

ssh user@temp-r720 "cmp /dev/backup_vg/tmp_backup -" < /dev/md0

For each ZFS volume:

LV=VM-DISK-NAME
LV_BYTES=$(blockdev --getsize64 /dev/ssd-vg/$LV)

# Create ZFS volume with same size
zfs create -V ${LV_BYTES}B vm_pool/$LV
zfs set sync=disabled vm_pool/$LV

dd if=/dev/ssd-vg/$LV of=/dev/zvol/vm_pool/$LV bs=1M oflag=direct

# Re-enable sync
zfs set sync=standard vm_pool/$LV
zpool sync vm_pool

# Compare with backup
sha256sum /dev/zvol/vm_pool/$LV
ssh user@temp-r720 "sha256sum /dev/ssd-vg/$LV"

ZFS versions

The server was running Ubuntu 14.04 with an old ZFS version. To avoid bugs, I upgraded to the latest ZFS via five rounds of do-release-upgradeall worked flawlessly.

Performance

Some applications actually ran faster after migration. ZFS provided detailed metrics:

  • L2ARC read cache: 25% hit ratio, 72.7% compressed
  • SLOG write cache: 3.1 TiB, 111.2M transactions saved
  • ARC (RAM cache): 85GB used out of 90GB max, memory throttle count: 0

RAM caching explained the speed boost—I’d effectively moved from SSDs to caching in RAM.

Was it a success

I’ve reduced reliance on consumer SSDs, allowing flexible replacement without full data transfer. Enterprise drives (available cheaply on eBay) provide a more resilient setup, and I can now hoard spares affordably.