De-Duplication Is the New Word in Backup Technology
By Jeff Gross
Critical data is growing at an exponential rate, and tapes
are no longer the only option for backup. Data de-duplication technology (also called
data reduction or commonality factoring) allows users to store more information
on fewer physical disks than has been possible in the past, making the cost of
disk backup competitive with tape.
“Although the technology is fairly new, de-duplication is becoming widespread,”
says Stephanie Balaouras, an analyst at Forrester Research. “Right now, disk
space is three to four times as expensive as tape, but de-duplication can
reduce data that needs to be backed up by a ratio of 20 to one. The big question
is whether this technology is what puts the last nail in the coffin of tape
backup.”
As the name suggests, the goal of de-duplication is to eliminate redundant data
from backups. The technology replaces duplicate copies with much smaller
pointers to a shared record. This can take place at the level of either whole
records or smaller unique data segments.
For example, if someone e-mails a 10-megabyte Excel file to 10 people on a
network and each of them stores it, that translates into 100MB of backup disk
space without de-duplication. With whole-record de-duplication, one copy would
be stored along with 10 reference pointers. If, however, one of the users
changes the name of the file or alters the contents in even the slightest way,
the entire copy will be backed up. Using sub-record level de-duplication, only
the changes to the altered file would be saved, with pointers to the original.
Both de-duplication methods are usually used in conjunction with the
traditional compression algorithms — standard backup tactics that reduce the
space consumed on the backup disk.
The trend is toward subrecord level de-duplication. A wide range of systems
that provide de-duplication are already available, such as Quantum’s DXi
hardware or Cybernetics’ iSCSI SAN and software such as Veritas NetBackup
PureDisk. Along with dramatically reducing backup storage space consumption,
these technologies cut restore time and eliminate the need to wade through
incremental backup tapes. Most systems allow users to restore back to a specific
date and time, and some make decentralized backups possible.
Proceed With Care
Balaouras warns that while data de-duplication is fast becoming a standard
feature in backup systems, the technology is new enough, and there are enough
variations among applications, that buyers should proceed with care. A key
distinction is whether the data reduction takes place at the source (the backup
server) or the target (a virtual tape library or disk appliance). Source-based
processing uses much less bandwidth and provides for either local or global
backup, but it often requires users to replace their current backup systems or
run one system for central office backup and another for remote locations.
Whether de-duplication occurs during or after data are processed is also a
serious concern. Data reduction is very CPU-intensive and can slow down the
backup. Performing the de-duplication, after an initial backup has been
completed, however, requires more disk space and means that the data reduction
must be completed before the next scheduled backup.
Scalability and data integrity issues raised by the number
of times the data is processed by de-duplication and checking algorithms in
most systems are also issues users should investigate before they buy, says
Balaouras. But de-duplication is here to stay, and it’s accelerating movement
toward disk backup, especially among SMBs without large investments in legacy
tape systems.
“Tape will be around for a while — for one thing it’s got a better power and
cooling profile than disk, and that’s important in today’s data center,” says
Balaouras. “But data de-duplication is a reality — it will take some time to
sort out the approaches, but it definitely changes the comparison with tape.”
If you are interested in
improving the quality of your backup situation, please call us at J.G.
Networking (267) 496-0350