Inside Lightroom

Digital Media | Spotlight: Photography | Inside Lightroom | Blogs

The Fundamental Storage Strategy


It's been interesting reading people's views and suggestions about how to handle storage for a large photo library. The feedback I've seen on this topic in the comments to my last Inside Lightroom post has spanned quite an impressive range of solutions ranging from simple multiple disk strategies all the way to Solaris-based servers and remote copies. I've looked at several of these solutions myself and all get the job done with various levels of complexity. Regardless of the details of the solution you take or the technology you use, however, there's a basic strategy that you should follow: Keep at least three copies of your data.

Why three copies? The answer is that you want to make sure that you have a backup of your backup when your primary storage fails. Preferably, you want that third backup to be offsite just in case you have a catastrophic loss in your primary place of computing. Most of all, however, keeping lots of discreet copies of your data on different systems is the approach that people who are managing tons of data, like Google, are now taking. Instead of building fancy systems with stacks of RAID arrays, the vanguard of the storage community has adopted using cheap commodity hardware and are simply making more copies of their data.

An important thing to note in this discussion is that RAID is not a backup strategy. It only counts as one copy of your data. A RAID array should only be considered to be a drive thats faster than a single spindle and only in some cases slightly more survivable, as long as you're not running RAID 0. The theory says that you can replace a failed drive in a RAID 1 or RAID 5 array and everything will be magically come back to normal. My own experience with multiple RAID failures over the years, and that of several other people I've talked to on this issue, doesn't bear this theory out too well. Instead, it seems like a drive failure in a RAID array is a good sign that you have just a bit more time to get your data backed up one more time before things come totally apart.

The people in the storage industry know this. That's why you see lots of work going into advanced next generation file systems like ZFS. ZFS takes a different structural approach to how it keeps data on the spindles and is a much better fundamental design than what's come before. To take advantage of ZFS right now, however, you have to set up a second Solaris system. I find it fascinating that several of the commenters to my last post have taken this approach. But I know that for many photographers with big image collections, setting up a second server—with the attendant system administration duties—is a bit out of scope. As well, setting up a second server has other repercussions, not the least being additional energy use and related environmental impact. And, finally, even if you have a fancy ZFS pool at your disposal, in my mind it still only counts as one copy. You'll still want to have external backups. After all, a lightning strike can fry your computer, or a storm could flatten your house.

The good news is that you don't have to get fancy with additional servers and RAID systems to take advantage of the same fundamental strategy of keeping multiple copies of your photographs. You can do it with one system and simply load up on internal and external drives. The principle of making multiple copies of your data is independent of the technology used. The hard part, of course, is that regardless of the technology you use, you still have to set things up and keeping the flow of photographs from your primary disks to your backups going. The devil, as they say, is in the details.

I'm about half way through the process of migrating to my new storage setup that I'm implementing as part of my New Years Storage Resolution that I posted last week. Next week, I'll post more details.





AddThis Social Bookmark Button



Comments (7)

7 Comments

miro said:

I practice similar strategy

workstation
- online data

on location backup (filesharing server)
- very important data RAID 1
- rest RAID 5

off-site in area backup
- portable HDD

off-site out of area backup
- fresh set of tapes/dvds

Dave said:

I'm starting to put a backup strategy in place, mainly because of the amount of hard disk space my photos are taking up.

I know I don't have the discipline to reliably lug external hard disks around for an off-site backup, or reliably make backups on DVD...

What I'm doing so far is:

1) primary storage on my Powerbook (current years photos)
2) one backup on an old external hard disk
3) second backup on an old XP desktop with lots of disk space
4) automatic backup from the XP desktop to an on-line service for off-site backup (e.g. Mozy)

2) and 3) are carried out automatically once each day using a combination of applescripts, automator and rsync.

4) will be carried out automatically by the on-line services software once I get that part set up.

Oliver said:

Another reason to consider ZFS is its automatic checksumming of all stored files. This protects you against silent "bitrot".

When storing hundreds of GB of photos I'm highly concerned about the integrity of my primary storage. If that is compromised the best backup strategy can't help.

Hopefully, ZFS will become write-enabled on Leopard soon.

Elmar said:

In an article on my web site I wrote the principles of backup (Language German: http://www.elmar-baumann.de/fotografie/tipps/praxis/auf-reisen-bilder-speichern.html):
* Store the same data multiple times (redundancy)
* Compare backuped data with original data
* Use high quality storage media
* Use media of different technology (e.g. hard disk, DVD)
* Keep the backup media "good" (dark, cool, dry, ...) and at several places
* Backup again before the estimated life time of the backup media reaches it's end
* Backups have to be automated to ensure integry and accuracy ("laziness", "forgetfulness")

My solution is:

1) Automated Backup using an external hard disk every day. The hard disk is plugged to the PC only when the backup runs. I use rsync for this task.

2) Automated Backup on two high quality DVDs (Kodak Gold ["100 Years"] and Verbatim DVD-RAM). One of the DVD's kept outside my house. The automation is done by a script, which scans the hard disk for new and modified images and recommends a backup if the size fits to one ore more DVDs.

Fazal Majid said:

ZFS is available on Solaris, FreeBSD and OS X (if you are an ADC member and download the beta read-write support), but not on Linux.

There is another way to get the same benefits in a turnkey form, which is to buy a NetApp StoreVault storage appliance. They are quite pricey, however.

Automation is definitely key to the process of keeping multiple copies of your data around and backed up.

Oliver: Bit-rot is definitely a concern and something that does need to be addressed.

Fazal: Thanks for the catch. I removed Linux in the post. As far as the beta read-write support on Mac OS X, I didn't list it because of the beta label. As far as the StoreVaults, those are pretty sweet looking as well.


Branch said:

For "offsite" storage I've added a media safe to my office for backup #3. I store several external drives in the safe. While not as "safe" as a real offsite location it's much more likely to be up to date.

Leave a comment


Type the characters you see in the picture above.

Recommended Book

Tag Cloud

Stay Connected