One of the following scenarios is being observed:
1. Upon attempt to exit node from maintenance mode it stuck in status Exiting maintenance halted, the following error related to the issue with shaman resuming may be observed in /var/log/vstorage-ui-agent/messages.log on the issued node:
[root@storage-node-01 ~]# grep 'ERR.*node.*is not suspended with token' /var/log/vstorage-ui-agent/messages.log
ERROR 2021-06-04 12:14:02,420 r-94049c9ad0d443eb agent/presentation/api/ha/shaman.py:177:ResumeShaman.post status: 1 err: out: Error: node "7e8808440b804f98" is not suspended with token ""
2. Eligibility check during update to 4.7 version detected the following issue:
There are forcefully suspended nodes. Check "shaman stat -j"
Manual check of shaman cluster status shows that there are node suspended with
node_crash_per_hour_threshold mark, e.g.:
[root@storage-node-01 ~]# shaman stat -j | jq '.status.nodes | select (.status=="Suspended")'
"description": "Reached NODE_CRASH_PER_HOUR_THRESHOLD. See 'man shaman'.",
At some point internal monitoring service shaman detected 3 consequent failures of the same node per hour and suspended the failing node's membership in HA shaman cluster disallowing to exit node from maintenance mode or start update to ACI 4.7.
Resume node membership in HA shaman cluster using the following command executed on the issued node:
[root@storage-node-01 ~]# shaman resume --token node_crash_per_hour_threshold
[root@storage-node-01 ~]# vinfra node maintenance stop storage-node-01 --wait
Contact Acronis support in case if issue with node still persists or if assistance with root cause investigation of detected previously crashes is required.
To see more information about shaman HA monitoring check manual on any ACI node
# man shaman