How to store data #1: background
This post is a follow-up to a post from a couple of weeks ago, “How not to store data“, in which I admonished against the all-too-common practice at small production facilities of stashing data on lots of external hard drives with no coherent management plan.
The single most important characteristic of a storage system is reliability. And the most important thing to realize about reliability is that it’s not something achieved entirely through technological means; it’s something that emerges from a combination of technology and well-planned, properly followed procedures. That’s why this post isn’t a list of products to buy (though that comes next).
Drives fail, backup media goes bad, things get accidentally deleted. In order to store data reliably, you have to have redundancy. And in order to have confidence in the reliability of your storage, you have to know you have redundancy. This means on multi-person projects, you can’t just let everyone handle their own data in whatever way they want.
Why not just stick to the strategy of letting lots of external drives float around, but make a rule that everyone has to make sure there are at least two copies of everything? In my experience, such rules often go unheeded, and at any rate, if multiple people are interacting with the same data, everyone will assume someone else has taken care of that, when they haven’t, leaving your data unprotected — or assume they haven’t, when they have, leaving you with extra unnecessary copies of things.
The best strategy for small production facilities — including everything down to one-man shops — is to scale down the sort of approach used in well-structured enterprise environments, not to try to scale up the “stash it on the external drive” approach that works so well with your 13 year-old cousin’s iMovie projects. Fortunately, it is now possible to do this at very little additional cost.
The key to this approach is centralization — both of the physical hardware and of responsibility. The former means building a storage array, rather than just buying individual drives. The latter means having a single person who’s responsible for the care and feeding of all the organization’s data.
Coming up shortly: what to buy, how to set it up and how to use it.
[...] This is a followup to a previous post, in which I laid out the principles of how storage should be managed. This post will be the first of two which goes into detail about hardware, software, and how to set everything up. [...]