How to set up deduplication in the most efficient way
What is deduplication?
Deduplication is the process of minimizing storage space that is taken by data by detecting repetition and storing identical data only once. Deduplication may also reduce network traffic. For example, during a backup, if a file or block is found to be a duplicate of a file/block in the storage location, its content is not transferred over the network.
Acronis software will deduplicate backups saved to a managed vault if you enable deduplication during the vault creation. A vault where deduplication is enabled is called a deduplicating vault. The Deduplication add-on to the agent must be installed on any machine that backs up to such vault. Without the add-on backing up to the vault is not possible.
See more information in the Web Help.
How much space will I save?
The deduplication factor will show the size of archives in a deduplication vault. This will show the difference between the sizes they would occupy deduplicated rather than non-deduplicated. This depends on a number of conditions, such as number of machines in the environment and amount of unique data. A reasonable expectation for typical environments with 20-30 machines is a ratio of between 10% and 3% (provided the average amount of unique data in total is approximately 10% and also using full backups instead of incremental/differentials).
How is the deduplication ratio calculated?
The Deduplication ratio is obtained by dividing the unique data size by the backed up data size and multiplied by 100%. Here, the unique data size is the amount of unique data stored on the managed vault. Backed up data size is the total size of all data being backed up. If some data is backed up several times (for example with full backup), it’s size will be counted several times. The vault compression is not taken into account in this parameter. The Deduplication ratio is provided in the Vaults view and Vaults report.
In what cases is deduplication most effective?
Deduplication is most effective when there is redundant data across systems. For example, in a typical office environment, most workstations will have similar operating systems, such as Windows 7 or Windows XP, and in the datacenter most servers will have Windows 2008 or Windows 2012. Similarly, many machines have the same applications, such as Microsoft Office, or Adobe Acrobat. In these cases deduplication ratios are often high.
It is also effective with incremental backups, provided that the changes to the data on different machines are similar or just moved from one machine to another.
How to speed up deduplication performance?
Ensure your Storage Node hardware corresponds to the recommendations provided below.
What do I need to have deduplication in my environment?
You need to have a Storage Node server which conforms to the hardware recommendations specified below. On the Storage Node, you need to have a managed vault with deduplication turned on. Deduplication feature is included in all Acronis Backup Advanced licenses. You need to have an Acronis Backup Advance license on each machine that you are going to back up with deduplication.
How is deduplication licensed?
Deduplication is licensed per machine being backed up. Deduplication is available only for advanced editions of Acronis Backup and is included in the license. There is no license needed for a Storage Node, so you may have as many Storage Nodes as you need.
Are there any deduplication restrictions?
Deduplication with image-level backups is not performed on volumes with an allocation unit size not divisible by 4 kilobytes. For both types of backups, deduplication is not performed with encrypted archives but it is possible to specify vault level encryption.
How I can be notified about shortage of free space in the vault?
If you use Management Server, you can configure email notifications about vault state. Refer to the following article about alerts: Alert notifications.
Setting up a machine for Acronis Storage Node
- Prepare a dedicated machine for Acronis Storage Node hosting a deduplication vault.
- 64-bit operating system.
- Server Operating System is preferred.
- 16 GB of RAM per 1 TB of unique data. Resources usage has been significantly improved in the Update 6 (build 43909/43713 depending on the localization), 3GB of RAM is required per 1 TB of unique data.
This is a recommendation for a worst case scenario. It is not necessary to follow this recommendation if you do not experience a deduplication performance problem. However, if the deduplication is too slow, adding more RAM to the storage node may significantly raise the deduplication speed. In general, the more RAM you have, the greater the deduplication database size can be, provided that the deduplication speed is the same.
- Multi-core processor with at least 2.5 GHz clock rate.
We recommend that you use a processor with the number of cores not less than 4 and the clock rate not less than 2.5 GHz.
- Only one deduplicating vault on each storage node
One vault per one ASN is the best practice for better performance since less simultaneous processes (such as indexing or compacting) are running; for better deduplication because data cannot be deduplicated across different vaults. In case of several vaults, available RAM volume is distributed in proportion to the number of the vaults.
Setting up a centralized managed vault with deduplication
- Vault data and vault database folders should reside on different physical devices to avoid performance degradation.
- The database must reside on a fixed drive. Please do not try to place the deduplication database on external detachable drives.
- Having one deduplicating vault per Acronis Storage Node is the best practice.
- The deduplication database should not reside on the C:\ volume and not on the same disk as the operating system. The reason is that the operating system has a lot of hard disk reads/writes which significantly slow down the deduplication performance.
- If vault data is stored on a NAS, make network connection as fast as possible. Gigabit Ethernet is recommended.
- If vault data is stored on locally attached HDDs, use the fastest controllers and high RPM drives. I/O is the main bottleneck for deduplication speed.
- Make sure there is plenty of free space on the deduplicating vault storage. You can estimate the recommended free space being equal to 110% of occupied space. For example, if the vault data occupies 10 GB, you should have 11 GB of free space.
- The volume to store the deduplication database should have at least 10 GB of free space. When backing up a large number of machines, the required free space may exceed 10 GB.
Selecting a disk for a deduplicating vault
- Minimal disk access time is important. It is recommended that you use a fast IDE drive (7200rpm or higher), SCSI drive, or an enterprise-grade Solid-State Drive (SSD).
- For the purpose of data loss prevention, we recommend using RAID 10, 5 or 6 with BBU (Battery Backed Unit). RAID 0 is not recommended since it not fault tolerant. RAID 1 is not recommended because of relatively low speed.
- There is no preference to local disks or SAN. The device that hosts the vault must have plenty of free space for indexing and compacting operations.
Recommendation for network bandwidth
1GBit networks are recommended. However, if you have a smaller environment and are just backing up a handful of machines in parallel, then 100MBit networks will suffice.
Recommendations for a client machine
Deduplication does not add any additional requirements for the machine where Acronis Agent is installed. An 800 MHz processor and 512 MB of RAM is enough to run deduplication where Acronis Agent has been installed. There are no additional requirements.
Disabling SSL on a machine may increase backup speed when the machine backs up to ASN alone. If multiple machines back up simultaneously, disabling SSL does not have effect. Currently SSL encryption is turned on by default.
ASN client machine number estimations
One ASN can process a limited number of client machines. To estimate this limit one should take into account how much data is going to be backed up and take into consideration the general backup to ASN speed as well as indexing speed.
Backup to ASN speed depends on the ASN machine configuration and it is actually hard to estimate the configuration to be used. Here is an example test results for the case of 50/50 ratio between unique and duplicate data, which can be used for rough estimations (the configuration of the test server: 2x Intel Xeon E5420 2.5GHz, 8 GB RAM, RAID0 on SATA 7200 rpm disks (6 disks) for both storage and DB):
- Backup data size: 100 GB
- Backup time: 40 Min
- Indexing time: 140 Min
Backup and restore work almost independently so to estimate how much data can be processed by a single ASN indexing time can be used. In the example configuration ASN can reindex approximately 1000 GB of backup data a day.
The amount of data to be backed up from one machine can be taken from internal company statistics. General assumption is that each client machine has about 70 GB of data to be backed up with 2% daily change rate. For initial full backups ASN will be able to process 1000/70=14 machines a day (if the network bandwidth is not a bottleneck, see above). For incremental data processing the number of machines is pretty big. So the number of machines to be processed by one ASN depends on the time one has for initial backups.
Number of connections
The default settings for ASN connections and backup queue length are:
- 10 machines for simultaneous backup
- 50 machines waiting in the queue
These numbers can be safely changed to 40/50 for now, due to major enhancement in ASN handling parallel backups. This can reduce an average backup time.
Validation checks the consistency of backup files and assures they have not been corrupted by third-party factors like security software scans or physical damages to the hard drive thus checking the possibility of data recovery from a backup (you can find more information here). We recommend to validate vaults once in a while.
Backing up vaults
Why should I backup a vault?
Vault backup is needed for the cases when the Storage Node server is corrupted. In this case you can install the Storage Node on new hardware and recover vault data.
How to backup a vault?
Before backing up the vault, appropriate Storage Node service should be stopped. It’s better to find a time when no backups pointed to the vault are being performed. Otherwise, they will fail as soon as Storage Node service is stopped. Vault data is stored in two folders: vault data path and vault database path. These two folders can be situated on two different volumes. Both volumes should be backed up with image-level backup. Make sure that TIB files are not specified to be excluded from the backup. After backing up a vault, the Storage Node service must be started. Service start and stop commands can be initiated in an appropriate backup plan, pre and post-command.
How to recover a vault?
If the original vault data is lost, just perform image level restore from the backups you have and attach the vault to the Storage Node. No service restart is needed.
How to attach a recovered vault to the Storage Node?
You can attach a vault data folder and an appropriate vault database folder to a Storage Node. You can also attach a vault data folder only and a vault database will be recreated, but it is time consuming operation. Read the following online help article regarding this question: Attaching a Managed Vault.
Are there ways to copy the Vault data without stopping the Storage Node service?
Yes, by using the replication mechanism. Just specify a 2nd location in the backup plans that use this Storage Node. For example, there is a backup plan that backs up to “Storage Node 1”. You need to edit it to specify “Storage Node 2” as the 2nd location. As a result, new backups will be replicated to “Storage Node 2”. Refer to the following article regarding replication: Replication and retention of backups.
Another way is to export vault archives to a specified folder and import them back if needed. Export and import of a whole vault can be done with the help of the Acronis Backup Command Line component. Vault export works slower than backing up vault disks but it does not need to stop the Storage Node service. Therefore, it can be done at any time. Here are some examples:
Export all archives from the vault to a local folder:
export archive --loc=bsp://asn1/vault1 --credentials=user1,password1 --target=c:\Archives
Export all archives from a local folder to the vault:
export archive --loc=c:\Archives --target=bsp://asn1/vault1 --credentials=user1,password1
Refer to the following article about export archive command: Export archive.