Deduplication detects data repetition during and after backup and stores identical data only once
Description
(!) Deduplication is a paid option. See Acronis Backup & Recovery 10 Editions. A separate license needs to be purchased to install deduplication.
Overview
Deduplication is the process of minimizing storage space taken by the data by detecting data repetition and storing the identical data only once.
For example, if a managed vault where deduplication is enabled contains two copies of the same file - whether in the same archive or in different archives - the file is stored only once, and a link to that file is stored instead of the second file.
Deduplication is performed on disk blocks (block-level deduplication) and on files (file-level deduplication), for disk-level and file-level backups respectively.
Deduplication ratio
The deduplication ratio shows the size of archives in a deduplicating vault in relation to the size they would occupy in a non-deduplicating vault. The higher the deduplication ratio, the more advantageous the deduplication.
For example, if you back up two files (1 GB each) with identical content, the total size of the resulting backups in a non-deduplicating vault will be 2 GB. In a deduplicating vault - 1 GB. Hence, the ratio in this example is 2:1.
If you back up two files (1 GB each) with absolutely dissimilar content, the total size if the resulting backups in a deduplicating vault will be 2 GB. Hence, the ratio in this example is 1:1.
A reasonable expectation for a typical environment is a ratio between 1.2:1 and 1.6:1.
How deduplication works
Deduplication at source
When backing up to a deduplication vault, Acronis Backup & Recovery 10 reads the items being backed up - disk blocks for disk backup or files for file backup - and calculates a hash value of each block.
Before sending the item to the vault, the agent queries the deduplication database to determine whether the item's hash value is the same as that of an already stored item.
If so, the agent sends only the item's hash value; otherwise, it sends the item itself.
Items that cannot be deduplicated (see Deduplication restrictions below) are transferred to the vault without calculating their hash value.
Deduplication at target
After a backup to a deduplicating vault is completed, Storage Node deduplicates data in the vault as follows:
Storage Node moves the items (disk blocks or files) from the archives to a special folder within the vault, storing duplicate items there only once. This folder is called the deduplication data store. Items that cannot be deduplicated remain in the archives.
In the archives, Storage Node replaces the moved items with the correspondent references to them.
As a result, the vault contains a number of unique, deduplicated items, with each item having one or more references to it from the vault's archives.
Compacting task
After one or more backups or archives have been deleted from the vault - either manually or during cleanup - the vault may contain items which are no longer referred to from any archive. Such items are deleted by the compacting task, which is a scheduled task performed by the storage node.
By default, the compacting task runs every Sunday night at 03:00. This can be rescheduled by clicking Reschedule compacting in Storage Node. The task can also be manually run in Storage Node.
Deduplication restrictions
Block-level deduplication restrictions
During a disk backup to an archive in a deduplicating vault, deduplication of a volume's disk blocks is not performed in the following cases:
- If the volume is a compressed volume;
- If the volume's allocation unit size (cluster size or block size) is not divisible by 4 KB;
For example, the allocation unit size on most NTFS and ext3 volumes is 4 KB and so allows for block-level deduplication.
- If the archive is password-protected;
(!) If you want to protect the data in the archive while still allowing it to be deduplicated, leave the archive non-password-protected and encrypt the deduplicating vault itself with a password, which you can do when creating the vault.
Disk blocks that were not deduplicated are stored in the archive as they would be in a non-deduplicating vault.
File-level deduplication restrictions
During a file backup to an archive in a deduplicating vault, deduplication of a file is not performed in the following cases:
- If the file is encrypted and the In archives, store encrypted files in decrypted state check box is unchecked (it is unchecked by default);
- If the file is less than 4 KB in size;
- If the archive is password-protected;
- Files that were not deduplicated are stored in the archive as they would be in a non-deduplicating vault.
Deduplication and NTFS data streams
In the NTFS file system, a file may have one or more additional sets of data associated with it (often called alternate data streams).
When such file is backed up, so are all its alternate data streams. However, these streams are never deduplicated - even when the file itself is.
(!) Deduplication does not work for backup archives stored on FTP locations. The reason is that deduplication is available for archives stored in a managed vault. A managed vault cannot be created on FTP. See Acronis Backup & Recovery 10 Vaults.
More information
Deduplication at incremental backup will not produce much effect because:
- The deduplicated items that have not changed are not included in the incremental backup;
- The deduplicated items that have changed are not identical anymore and therefore will not be deduplicated.
See also Acronis Backup & Recovery 10: Deduplication Best Practices.