ZFS in OS X Leopard
Having heard several completely mangled attempts to explain the benefits of ZFS (reliably believed to be a feature in the next version of OS X) in various places (including on This Week in Media a few weeks back), I feel I should probably take a crack at explaining what it is and, more to the point, why it matters to indie filmmakers. Because it does matter, a lot.
This post is background for an upcoming post on building a cheap RAID to hold all those hours of 4K footage you’ll be shooting in a few months.
There’s tons of information about ZFS around the web, but much of it is fairly technical, and as far as I’ve seen there’s almost nothing that explains in concrete terms what makes it different from what we’ve got now, and why it’s so important to IT production workflow.
Basically, ZFS is a much more flexible way of handling data storage than what traditional file systems provide. Traditional file systems work with discrete volumes, which may span one disk, or, with RAID, more than one disk.
With ZFS, instead of having inflexible volumes, you create “storage pools” across multiple physical disks. Pools are flexible; if you create a pool spanning four drives and need to add a fifth drive, you can do that without having to recreate any of the file systems in the pool. You can even remove a drive from a pool, assuming there’s enough space to store all the data without it. With a simple command, ZFS will rearrange your data so the drive you specify is no longer necessary, after which it can be removed.
Drives in the pool can be used for any combination of striping, mirroring, or RAID Z (which is sort of like RAID 5), plus there’s support for hot spares.
With traditional RAID setups, though, if you want to add drives or rearrange things, the whole array has to be backed up and reformatted, or you have to use pricey volume management tools which could take many hours to move your data around, during which time your storage is unavailable. With ZFS, none of this is necessary.
Once a pool is created, file systems can be created and rearranged within that pool extremely easily; it’s nearly as easy as creating or rearranging directories presently is. As many file systems as you’d like to create can share a storage pool.
ZFS also has built-in pervasive checksumming features so it’ll automatically detect if your data gets corrupted (and recover it, if you’ve set things up with some amount of redundancy). And because of its architecture, the data on disk is always in a consistent state, eliminating the need for file system repair utilities and the speed hit associated with journaling.
Right now your options for using ZFS are buying ludicrously expensive Sun gear, or downloading OpenSolaris x86 and trying to piece together a system that will work on your own. (I spent some time trying to figure out what hardware would work for an OpenSolaris-based storage server… it’s not easy to find good information.)
Having support in OS X will make things a lot easier. Up next: discussion of how you can leverage ZFS in OS X to get most of the benefits of enterprise-class storage at a fraction of the price.
[...] How do you create a volume across multiple drives that makes use of distributed parity (see previous post) without shelling out big money for an enterprise storage system? Windows XP Pro have built-in software RAID 5 support, which which will do nicely. Mac OS X Leopard will have ZFS, which will to really nicely, creating a RAID Z storage pool. [...]
Pools are flexible; if you create a pool spanning four drives and need to add a fifth drive, you can do that without having to recreate any of the file systems in the pool.
This isn’t quite so flexible; if you are using RAID-Z you have to add a whole stripe (e.g. 4 more disks) at a time.
You can even remove a drive from a pool, assuming there’s enough space to store all the data without it. With a simple command, ZFS will rearrange your data so the drive you specify is no longer necessary, after which it can be removed.
This feature (zpool remove) is planned but not implemented yet.
(Let’s hope WordPress doesn’t eat my formatting.)
[...] Of the announced features of Leopard, the only really significant one for our market is ZFS, for reasons I’ve discussed in the past. While it certainly would be great, though, it’s by no means necessary. [...]