68284: Acronis Cyber Infrastructure: Append throttling on ABGW installations with public cloud backend

use Google Translate

Last update: 25-12-2022

This article is intended for system administrators operating Acronis Cyber Infrastructure with Acronis Backup Gateway cluster backed by any object storage backend (Amazon s3, Azure, Google, Swift, or other).
This article describes new behavior introduced in version 4.0 of Acronis Cyber Infrastructure. 

Symptoms

After updating to version 4.0 or any subsequent version of Acronis Cyber Infrastructure or deploying a new cluster, you may notice unexpected growth of the "append throttle" plot on the WebCP dashboard located at Storage Services -> Backup Storage page:

or at the Acronis Backup Gateway dashboard in Grafana:

Cause

"Append throttling" is a measure intended to slow down write operations to the Acronis Backup gateway when it is not enough storage space left, thus protecting the storage from quick overfill and the backups from failing with "no disk space" error. 

This mechanism is especially important for the Acronis Backup Gateway installations working with Object Storage because it is allowing to balance speed of the data incoming to the local staging storage space and the speed of the data being offloaded to the Object Storage. Performance of the object storage push is often lower than the speed of writing data to the local storage due to WAN transfer delays, limitations imposed by the public clouds, or other reasons. Because of that if the local staging space of the cluster is not large enough to accommodate at least 24 hours worth of backups, the backups might fail due to "no disk space" error. 
Append throttling is here to prevent this.

Previously append throttling logic was taking into account only the total amount of disk space left on the storage cluster. However, this approach proved to be not sufficient for the very small storage clusters around 100-500 GBs in size. For the similar clusters "out of space" condition was still possible under the heavy backup load.
In order to better support storage clusters with a small available space, append throttling is now activated on a per-file basis and is using object storage push backlog for a given file rather than the total amount of free space on the storage. Due to this throttling for certain files might start then there is still enough free space left on the cluster. 

Once the part of the file which is not yet pushed to the cloud is reaching a certain threshold, throttling for this file is started. Once it is reaching twice this value, the client writes are completely stopped until enough data will be pushed to the Object Storage. 

The default threshold value is 300 000 000 bytes (about 300 megabytes). 

Solution

The current default threshold is optimized for the small local storage clusters with a local storage space of 500 GB and less. For similar clusters, we do not recommend changing the default throttling threshold value. 

However, on large clusters capable of accommodating 1-2 days' worth of backups locally and having high local writing speeds, this limit might cause unnecessary slow-down for the backups. In this case, it might be controlled by the following configuration parameter: 

storage.ostor.max_not_pushed = 300000000

If your storage cluster is above 500 GBs of total size and you have encountered unexpected append throttling activations, please feel free to enlarge the threshold eg. by executing the following command: 

echo "storage.ostor.max_not_pushed = <new limit>" > /mnt/vstorage/vols/acronis-backup/conf.d/max_not_pushed.conf

Where <new limit> should be replaced with an actual threshold value. To apply the change vstorage-abgw service should be restarted on all nodes included into Backup Gateway, e.g.:

# echo "storage.ostor.max_not_pushed = 107374182400" > /mnt/vstorage/vols/acronis-backup/conf.d/max_not_pushed.conf
# systemctl restart vstorage-abgw

Recommended threshold values:

- For large storage clusters of 1TB in size and above - a threshold could be set to 100 GB (107374182400), or completely disabled by setting it to 1 TB (1099511627776)

- For medium clusters between 500GB and 1TB - a value of 1 to 3 GB is recommended, depending on the number of files on the storage (between 1073741824 and 3221225472)

- For the small clusters below 500GB in size it is recommended to leave the current default value. 

Tags: