How to Store Digital Photos
Fifteen years ago, the notion of storing your photos on a computer didn’t exist, but since then the archiving of photos has become a major problem for professional photographers and consumers alike – particularly as the resolution of the cameras has increased significantly from the 1.3 megapixel first-generation devices to the 24MP devices of today. It is not unusual to have a single 70MB file.
There are many ways to store photos, ranging from the cheap to the expensive; the limited to the robust. In this article, we’re only going to look at the ways of physically storing your digital photos. We won’t be talking here about asset management strategies (i.e. which images to store, workflows, meta data, software, etc), but Peter Krogh’s article on Digital Asset Management does a great job with that.
Photographs can hold tremendous emotional and monetary value. Before the proliferation of digital cameras, photographers could store a physical negative or slides with the expectation that the medium would last decades, if not longer, under a controlled environment.
But with digital photography, images are only stored as data, and the integrity of that data is only as good as the archiving system. None of the typical storage media, from CD/DVDs to hard drives, have proven to be as resilient as film, and therefore, photographers need to create archiving systems that support multiple copies.
Part I: Physical Issues to Consider
Before we jump into actual product recommendations, here are a few issues to take into account before you plunk down your money:
– Initial investment
– Power requirements
If you own a car or a house, you know that the purchase price is only part of the total cost of ownership. Similarly, when you look at a long term strategy for storing photos, the cost of the storage device is usually just the tip of the iceberg. Insofar as storage as concerned, you have to think about your electricity bills, as well as the service plans that are often bundled with the higher-end offerings. Total cost of ownership is a non-issue for a single hard drive, but it becomes a real factor is you have large storage arrays that need maintenance, power and potential cooling to operate optimally.
For our purposes, we’ve created three pricing tiers to help you assess your options for your initial investment:
1. Cheap: <$500
2. Mid: < $1000
3. High: > $1000
We only have so much desk and closet space, so expandability of any storage system is a huge concern. In other words, as you continue to build your photo archive, how able is your storage solution to sustain that growth? A $150 external hard drive doesn’t have any expandability options within the same enclosure, whereas high-end RAID systems can typically be expanded to TBs of storage. Of course, expandability comes at a price, because it requires particular hardware and software to manage lots and lots of data.
The other aspect is one of convenience. Higher end systems can add additional storage which appears as a single volume on your computer, whereas, lower end systems will appear as multiple connected drives. Good cataloging software can alleviate the issue of finding an image across many devices, but you might find it unwieldy to have 20 icons for 20 drives on your desktop.
Google has a thousands of computers on its network that power all their various products – from search to email and beyond. As their company grows, they are required to build more data centers to hold all these computers. It might surprise you to find out that one of the biggest issues they contend with when planning new construction is power. Computers are power-hungry devices, and because computers are relatively small, you can stuff a lot of them into a tiny space.
The typical household outlet can support about 1800 wattsof power. A Mac Pro tower power supply has a maximum rating of 1000 watts. Storage devices are less power-hungry, but it’s still a consideration for scaling out your hardware–not to mention your electricity bills.
Desktop devices are normally designed to be pretty quiet, but storage devices often cram a ton of mechanical hard drives into a small space. Since the efficiency of computer components is adversely affected by heat, it’s not surprising that storage devices often have multiple fans built into them. The fans are often temperature controlled by the computer, but when it’s hot, and the fans are running at full speed, it can be very noisy – unbearably so. Unless the ambient noise in your room is already very high, anything over ~25 decibels is probably going to irritate you pretty quickly unless it’s in a closet.
Speaking of fans and closets, while a closet will reduce the audible noise from a storage device, it won’t help dissipate the heat, of which these devices can generate a ton. You’ll need some sort of venting from your closet into ducting or between walls to effective dissipate heat, or your components will start operating erratically, and will shut off automatically to protect the circuitry.
If you’ve only ever used a single external hard drive plugged into the back of your computer, thinking about power, heat and noise might seem a bit esoteric. But this phenomenon is very real, and heat has to be considered as a part of your planning.
When you have one or two hard drives, plugging them into your computer is no big deal. But once you start getting into higher numbers of devices or higher capacity, you might want to start thinking about putting the storage device somewhere else to deal with space, heat and noise. Consumer-level hard drives usually plug into the USB or FireWire ports. Transfer speeds through these mechanisms are fast, but on the other hand, you do have to worry about whether these connector styles will be around in 5 years.
What do I mean? In the world of televisions, we used to hook up our VCRs with the white and red RCA cables. Then there was S-Video, component video, and now HDMI. And just when you think HDMI will be the standard, along comes DisplayPort.
On the other hand, Ethernet has been around for decades, and the form factor has not changed. By making better cables, the engineers have been able to eke out even more bandwidth without having to resort to expensive materials like optical cabling. For this simple reason, having a network-enabled storage device isn’t a bad way to ensure the longevity of your equipment.
|Type:||Note:||Max Transfer (MB/sec)||Which computers support it?|
|USB 1.0||1.5||Some computers after 1996|
|USB 2.0||60||Mac/PCs after 2001|
|USB 3.0||Not in use yet||625||None. Maybe 2010|
|Firewire 400||49.13||Some macs after 2002. Some higher-end PCs|
|Firewire 800||98.25||Some Macs after 2002. PCs typically require an add-on card.|
|SATA||Serial ATA. An internal connector type.||300||Most computers since 2003|
|eSATA||External SATA designed for external devices||300||Very rare. Requires an add-on card. Likely to be supplanted by USB 3.0|
Different connectors support different transfer speeds. But that isn’t the only factor. Drives come in several speeds (RPM). Faster drives typically read/write faster, but speed comes with a price (and typically runs hotter, uses more power and generates more noise). Consider the following 250GB drives and their prices:
Part 2: Capacity Planning
How much space do you really need? Here’s a quick calculator:
How many megapixels is your camera?
Do you shoot RAW?
How many images do you shoot per month?
Your total storage needs per year:
For each storage device that I purchase, I have a goal of being able to store at least 1 year’s worth of photos on it. The practical reason is that I don’t want to spend a lot of time each year configuring hard drives and transferring files. On the other hand, since the cost of storage has historically declined so dramatically, it’s probably not worth buying a storage unit that is designed to store 3 years of data.
What is RAID? What are its advantages for storing photos?
RAID (which stands for Redundant Array of Independent (or Inexpensive) Disks) is technology for creating local, redundant copies of your data on linked hard drives. The advantage is simple: If you have a single hard drive, and the drive breaks, you’ve lost the data. With RAID, data is “striped” (or mirrored) across multiple hard drives, so the chance of losing data is mitigated.
RAID isn’t bullet-proof, but storing data is about mitigating risk, not eliminating it completely. The more copies of a single piece of data, the less likely that a single event will cause the loss of that data. Creating a manageable, redundant repository of your digital photos is the heart of any archival strategy.
Some people simply mirror data on multiple drives. This works well when your data requirements are relatively modest. For example, you can get an external 1TB drive for about $120. Two external hard drives with mirrored data are still cheaper than any RAID solution, but of course, you need to manually copy files, or find some software to manage the mirroring.
A word about CDs and DVDs.
I simply don’t consider CDs and DVDs to be archival, irrespective of manufacturer claims. Will a “Gold” DVD last for many years under optimal storage conditions? Probably. But I don’t have optimal storage conditions. I can’t effectively regulate the heat and humidity of my apartment, and although the capacity of these removable media has increased dramatically over the years, they simply are too slow and cumbersome for practical purposes. In my opinion, a photo archive needs to be readily accessible, and loading DVDs into a drive doesn’t meet that criteria.
Secondly, the rate of increase in storage density is dramatically faster in hard drive technology than it is in removable media. Hard drives are available with capacities as large as 2TB ($0.11/GB), whereas DVDs typically only hold 4.7GB ($0.16/GB).
That said, Peter Krogh recommends the 3-2-1 system for archiving (which is beyond the scope of this document). He advocates for 3 copies of any file on 2 different media types with at least 1 geographically redundant location. Under this system, CD/DVDs might act as a different media type.
Network-Attached Storage (NAS) is a technology that allows the storage device to sit directly on an local area network (LAN). The device is basically a self-contained computer with extended storage typically running an operating system like Windows Server to allow many types of computers to connect to the shared storage.
For the photographer, NAS has the benefit of not being connected to a computer. So if your primary computer is a laptop, or you don’t want a big drive on your desk, you can store the NAS device anywhere there is Ethernet access. The downside is that Ethernet is slower than the drive that is directly connected to your computer. You might find that NAS is suitable for storage, but not as quick for making batch adjustments in a program like Aperture.
You might run across Storage Array Networks in your research, but they aren’t widely deployed outside of large enterprises. In a nutshell, the technology provides a way for remote storage devices (tape, disk, optical jukeboxes) to appear as if they are attached locally to computers. For our purposes, they are completely out of scope.
Over the years, manufacturers have created a slew of different storage media in different shapes and sizes (aka “form factor”). You might remember such devices as SyQuest, MiniDisc, Zip drive, EZ 135, Bernoulli Box, Jaz drive, etc. Historically speaking, alternative form factors have not fared well, therefore, using devices that aren’t created through a consortium and have wide adoption is a recipe for disaster.
A few pundits have claimed that we need to worry about RAW formats and even JPG itself, but in truth, it’s a lot easier for companies to reverse engineer an image format (i.e. software) than it is to maintain a unique mechanical device.
Part 3: Pick a Device
Now that you have your data requirements determined, here are a few choices:
Cheap (< 500GB/year)
The Western Digital MyBook is a popular external hard drive that comes in capacities up to 2TB. Inside the shiny black plastic enclosure is a single 3.5″ hard drive and power supply with a USB 2.0 connector. There’s nothing particularly fancy about this device, and I used one (until someone dropped it on the floor). If you’re looking for something with a little bit more capacity, Western Digital also produces a “ShareSpace” product that is network based and supports RAID 0/1/5.
Pro: Cheap, quiet
Con: No redundancy; drive failures are not uncommon.
Connectivity: USB 2.0
Mid (< 1TB/year)
Drobo is a self-contained device made by Data Robotics. It’s a cube-shaped device that will hold up to (4) 3.5″ hard drive with a maximum capacity of 4TB, and unlike traditional RAID, you can use drives of different capacities. But like any system that supports redundancy, you cannot claim the entire physical disk size. For example, if you’re mixing drive sizes, you give up the size of the largest drive in array for redundancy. The Drobo is in its second generation, and has an active user community.
The Drobo plugs into a computer through the USB 2.0 or FireWire 800 ports (so transfers are pretty darn fast). They also created an add-on to make it network-enabled. Drobo also produces a “pro” model that can be expanded up to 16TB.
Pro: Compact design, affordable.
Con: Proprietary software.
Cost/GB: $0.30/GB (2nd generation Drobo with 4TB disk (3GB Usable))
Connectivity: USB 2.0, Firewire 800, Ethernet (optional)
NETGEAR produces the series of ReadyNAS devices. NAS units are self-contained servers, and as such do not require a host computer to function. Higher-end devices feature more RAM and minor features like more connection ports, etc. NAS solutions are built for small workgroups, and might be overkill (at least from a cost perspective) for individuals. Like the Drobo, they support their own flavor of RAID which allows you to mix drive sizes.
ReadyNAS also offers an optional online storage component called ReadyNAS Vault, which we can’t currently recommend because the price is far too expensive.
Pro: Self-contained, supports multiple connectivity protocols (e.g. FTP, NFS, HTTP, etc)
Con: More expensive
Cost/GB: $0.55 – $0.66 (ReadyNAS NV+ with 1TB & ReadyNAS 2100 with 4TB)
Connectivity: Gigabit Ethernet, USB 2.0
Network-Attached Storage: Iomega StorCenter & Snap Server
Iomega used to own the removable media space with the Zip drive, but those days are long gone, and they’ve since expanded their reach into departmental and enterprise storage options. The StorCenter is an expandable hardware RAID enclosure in the category of Network-Attached Storage (NAS). It uses Windows Server software to power the administration, but that’s fairly invisible from the end user perspective, and rather than plugging into a server, it simply resides on your network via an Ethernet cable.
Snap Server is a similar unit to the Iomega StorCenter. Snap Server was acquired by enterprise storage vendor, Overland Storage a few years ago. The UNIX-based system is perhaps a tad less prone to crashes than its Windows counterpart.
The “pizza box” form factor is designed for data center racks, and consequently, the fans in the unit are extremely noisy. You can’t expect to put one of these guys out in the open of your office.
Pro: Highly Expandable, hardware-based RAID, management utilities, large feature set, fast throughput
Con: Expensive, large footprint, noisy
Cost: $1.78/GB (Snap Server 620 with 4TB)
Connectivity: Dual Gigabit Ethernet, optional SCSI
The advantage of using a “brand name” like Iomega is usually in the support options – whether that means speaking to a live person or having frequent software updates. But many organizations use “no name” enclosures, a dependable hardware RAID controller, and a slew of hard disks. These RAID-based solutions are self-contained, and don’t need to be plugged into another computer to operate.
There are many vendors that will build a storage device for you, so in that regards, it’s not truly a “do-it-yourself” project, but you’re not going to find slick software to manage the device, so it will take a little system administration know-how to ensure that the device works properly. If you’re the kind of person that change your own oil in your car, then DIY might be an option for you. But if you’re like me, then you’re probably happy to pay a slight premium for the convenience.
Pro: Cost of ~$1000/1TB for a highly expandable RAID solution.
Con: Requires a knowledge of systems administration
Part 4: Online Storage
Online storage of photos through services like PhotoShelter have numerous pros and cons but ought to be considered as a part of your overall archiving strategy.
First, online storage gives you geographical redundancy. Having two drives in your home is one way to create redundancy, but a single geographic event like a fire could destroy both drives. And believe me, lots of photographers lost irreplaceable work in Hurricane Katrina and in various California wildfires.
Online storage is a “managed” solution – i.e. you don’t have to worry about hard drives going bad or filling up. You just have to make sure you pay your bill. We have experienced and smart system administrators who watch all your data, so you can spend more time taking pictures.
Having your images online gives you access to them anywhere you have an Internet connection, and without the need to worry about your home network security. Of course, PhotoShelter offers many more features like public searching and licensing of your images, so while storage is the starting point, the potential to create revenues is a much more compelling reasons to get things online.
The downside is typically the speed of uploading. Even though our cable modems and DSL lines have gotten progressively faster for downloading, uploading is usually still stuck at turn-of-the-century speeds. This makes uploading large quantities of images a real challenge. However, many photographers are finding that there are lots of advantages of uploading select work online and reaping the benefits of the sales associated with their best images. Additionally, some online storage solutions (PhotoShelter included) allow you to send in a hard drive to be uploaded directly to the company’s servers – this is a huge time-saver, particularly in the initial migration of a large archive to a new service.
Part 5: Dealing with Failure
Hard drives are mechanical devices that will eventually fail. Modern hard drives have MTBF (mean time between failures) of between 500,000 hours (for consumer-level drives) and 1.5 million hours (for pro-level drives). If you do the math, this equates to about 60 years or more.
But a study by Carnegie Mellon University found that the manufacturers’ stated MTBFs were exaggerated as much as 15 times. So 4 years might be a better average lifetime. But the actual lifetime varies significantly by manufacturer, model, and lot. And unfortunately, when you buy a multi-drive enclosure, you usually get drives from the same lot. Drive failures in the first 6-12 months are extremely common, and drives that pass this initial phase tend to have a lower failure rate – indicating that many initial failures are due to design or manufacturing defects.
So the best strategy is to 1.) make sure you periodically use the manufacturer’s diagnostic software to monitor the health of the drives, and 2.) regularly replace your drives every 3-5 years (hopefully, this will coincide with your need to upgrade capacity).
What Should I Get?
If you’re an amateur or sole proprietor photographer, a high-capacity external drive will probably work best for you. Consider spending $300/year on storage.
If you shoot in excess of 1TB a year, you might consider a multi-drive enclosure. Consider spending $1000/year on storage.
If you have an entire business with staff, then you might consider a higher-end RAID system with network accessibility. Consider spending $2000/year on storage.
The good news is that the cost of storage continues to drop dramatically. In 1994, I purchased a 40GB hard drive for $500. Today, you can get a 1.5TB drive for $125. So create a good system for archiving your photos, and never lose an image!
hi, i really have to disagree on the RAID part !!!! it does not provide redundant copies of your data but redondant copies of datas that placed together are your data…..ONLY RAID 1 GIVES A TRUE BACKUP !!!!!! raid5 is just a fault tolerant system to prevent downtime of a storage service but not a backup solution !!!!! take a raid5…or a drobo….have a hardware failure of the raid card or the drobo case and NONE OF YOUR DRIVES WILL BE READABLE !!!! you will have to buy a spare raid card or a spare drobo to be abble to read your files.. pros should only stick with RAID1 !!!! 3 disks in a raid 1 mirror with 2 disks always on site and one disk off site…. i’ve lost numerous files after raid hardware failure….and a raid array takes forever to rebuild leaving the system vulnerable during that time (sorry for bad english…not my native language)
We use a ReadyNAS (www.netgear.com) at home backed up by an ioSafe Solo (www.iosafe.com) fireproof waterproof external drive. With this setup, we get: RAID (drive failure), Backups (RAID failure, accidental deletions, etc.), Fire (ioSafe) and Flood (ioSafe) plus the ioSafe bolts to the ground for theft!!! It’s not perfect but it’s something.
With regards to the heat problem, I have found that Western Digital’s green line of hard drives is great even if you don’t care about the environment. Other hard drives can get almost too hot to touch but these drives stay relatively cool even under heavy load. They also have lower power consumption which helps too.
Has anyone purchased or worked with the storage systems at MacGurus.com They seem like great drives with great service. The proof of the pudding is in the tasting. Has anyone tasted these products?
I do not know much about RAID, but what I feel is it is a backup copy. Now does that mean that if I lose data on one drive due to virus infection, I have another backup drive… which has a perfect copy….. but that too is virus infected and not retrievable?!
@whatsupdoc: There is nothing inherently wrong with RAID. Your problem is you were using RAID incorrectly. No level of RAID is a backup solution – including RAID 1 (or mirroring). If you have lost files due to RAID array failure that’s not the fault of RAID but due to your not performing proper backups! RAID is fault tolerance for hardware failure, not protection from data loss. if you accidentally delete a file on a RAID 1 (mirror) it will be gone on both hard drives. Heck, I’m setting up a software RAID 0 (striping with no parity) for pure performance. Yes, if I loose a drive I will loose the entire array – but it’s a risk I can afford to take as I have multiple, continuous, backups of my data. I am no longer a fan of vanilla RAID 5 – with hard drive capacities these days for primary storage there is little reason to move beyond mirroring. For pure capacity, even with larger hard drive RAID has it’s benefits, but I will only mess with self-configuring RAID like that offered by companies like Drobo with their BeyondRAID and a few other RAID vendors that have extended functionality in their array management. The ability to dynamically non-distructively resize an array is huge. It’s more than a simple convenience with the volumes of data we are managing today, but a necessity. Backing up an array (a couple of times if you are prudent) before breaking and rebuilding it to make it larger is crazy in this day and age. Since this article was written, the Drobo Pro has been released and for large amounts of fault tolerant storage for an individual user I would take it over any of the NAS solutions mentioned above. For multiple users, I would move up to the Drobo Elite. I never was a big fan of NAS, and it makes even less sense these days. I do use a regular firewire Drobo to back up my internal drives (via Time Machine), but I can see myself getting to the point where even the new Drobo Pro with five slots won’t be enough to backup my data and more importantly give me a meaningful history of files. For the money and the functionality it provides, the Drobo Pro is currently unbeaten. The Drobo Elite does things (like multiple machine access, data protection the equivalent of RAID 6, thin provisioning, non-distructive expansion or re-configuration of your array, etc.) that formerly were only possible on enterprise grade SANs that started in the $25K range – the value, power and flexibility are just astonishing. Even if you have one of the NAS’s mentioned in the article *you still need to back up your data separately* – maybe even more than one backup copy! It depends on how important it is to you. For me, many of my pictures are irreplaceable – I will never get to re-create those moments in time. I use TimeMachine for immediate local backup, I use backblaze.com for off-site backup with some basic history, and I make clones of important data and store the hard drives off site for longer term storage and more history than backblaze.com offers. The nice thing about services like backblaze are they always work continuously and automatically (much like Time Machine). I know from experience if backing up isn’t easy to do (i.e. automated) I’m not going to do it! Yes, it took a couple of weeks to do the initial upload of 1.8 TB of my data – but now that it’s over, it has no problem keeping up. If i dump a large number of photo’s from a shoot (100GB or so) it may take a day or two to catch up – but I have Time Machine and if I really want an immediate second copy, I will clone to a hard drive and take it off site. Eventually BackBlaze catches up. Moral of the story – don’t rely on just one backup, or technology like RAID (or even burning to DVD – optical media isn’t nearly as stable as many assume), to save your data.
My Book Elite 1tb External Usb 2 0 Hard Drive
Buy the BAAH0010HCH and other Hard Drives at Now available at PC Connect
I have 2 drobos myself, had to replace the fan because they were causing too much noise, something to consider when you consider to purchase one. But now my photos are secure and the drobos operate smooth and silently
I very much agree with Peter Krogh’s 3-2-1 approach, as this provides a ‘spread spectrum’, increasing the likelyhood that data will survive over the long term. Of all of the storage methods mentioned, I believe there is one that is essentially overlooked. Going back to the beginning of the article… “None of the typical storage media, from CD/DVDs to hard drives, have proven to be as resilient as film”. I believe the term ‘film’ in this sense is slightly inaccurate. I believe it may be more accurate to state ‘as resilient as photographs’, meaning not the film negative itself, but the developed image on photographic paper. Developed photos are known to have already have survived for than 100 years. If “none of the typical…media” are as resilient, then why should photographers, want to move away from including a photographic print format completely, and focus only exclusively in the un-tested world’ of electronic storage, with significant challenges. I understand the necessity to preserve data in its digital format, but offer that Peter’s theorem for preserving data, which is exceptional, might be extended to be: 3-3-1 “Three instances of a datum, on three media formats, one of which is printed media and not electronic, with one instance offsite” A printed photograph, of course, adds complexity and cost to an archival system, but the benefit is that this format is very familiar to everyone, and the most universally supported at any point into the future, as the only mechanism you need to read it, is your eyes. Perhaps an approach, of printing only the ‘keepers’, would help to minimize administrative effort and reduce cost. It might also be simpler to create an additional storage folder in the fileystem/folder-hierarchy, perhaps called ‘print’. Each time downloads are done from a capture device (camera), and photos are examined, sorted, labeled, processed, one additional work-flow step could be to create an extra copy of ‘printable’ photos into the ‘print’ folder. Periodically, the contents of the ‘print’ folder are printed, and then purged to accept the next round of printables. (Similar to emptying the memory card of your camera). When I say ‘periodically’, this can be as often as you need, be it weekly, monthly, quarterly, or annually. I believe a 3-month interval would be efficient