Symptoms
You see this alert in WebCP > Monitoring > Alerts:
No internet connection on the node <node>.
Cause
A background periodic task is running on each cluster node verifying internet connection and domain names resolving by checking connection with google.com domain name via ping utility. The "No internet connection" alert appears in case of 100% packages loss.
Solution
This article explains how to investigate the issue with Acronis Cyber Infrastructure logs.
The following messages in /var/log/vstorage-ui-agent/tasks.log are representing network connection check, example:
DEBUG 2020-05-25 17:04:44,280 t-555decc622e34515 agent/business/models/runner.py:34:CmdRunner._internal_checked_execute command "('/usr/bin/ping', '-c', '4', 'google.com')" executed (c: 0, o: PING google.com (172.217.194.139) 56(84) bytes of data.
64 bytes from 172.217.194.139 (172.217.194.139): icmp_seq=1 ttl=41 time=12.9 ms
64 bytes from 172.217.194.139 (172.217.194.139): icmp_seq=2 ttl=41 time=15.9 ms
64 bytes from 172.217.194.139 (172.217.194.139): icmp_seq=3 ttl=41 time=34.0 ms
64 bytes from 172.217.194.139 (172.217.194.139): icmp_seq=4 ttl=41 time=17.6 ms
--- google.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms rtt min/avg/max/mdev = 12.937/20.130/34.037/8.202 ms , e: )
The following command can be used to check detected packet loss on the affected node:
# egrep 'transmitted.*received.*loss' /var/log/vstorage-ui-agent/tasks.log | grep -v '4 received' | sort | uniq
Example output:
[root@node01 ~]# egrep 'transmitted.*received.*loss' /var/log/vstorage-ui-agent/tasks.log | grep -v '4 received' | sort | uniq
4 packets transmitted, 0 received, 100% packet loss, time 2999ms
4 packets transmitted, 3 received, 25% packet loss, time 3002ms
4 packets transmitted, 3 received, 25% packet loss, time 3003ms
4 packets transmitted, 3 received, 25% packet loss, time 3004ms
When packet loss is detected, find out the exact timeframes:
# grep -A3 'has_internet_connection.*err.*PING' /var/log/vstorage-ui-agent/tasks.log
Example output:
[root@node01 ~]# grep -A3 'has_internet_connection.*err.*PING' /var/log/vstorage-ui-agent/tasks.log
WARNING 2020-05-25 18:39:53,962 t-8ed9370e97914a01 agent/business/models/utils.py:372:has_internet_connection status: 1 err: PING google.com (172.217.194.139) 56(84) bytes of data.
--- google.com ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms
--
WARNING 2020-05-26 03:54:53,482 t-d2d30f097dea4fb5 agent/business/models/utils.py:372:has_internet_connection status: 1 err: PING google.com (172.217.194.113) 56(84) bytes of data.
--- google.com ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms
Check which interface is used for connection with Internet (IP 172.217.194.139 in this example) and verify if there are TX/RX errors detected on it:
[root@node01 ~]# ip r g 172.217.194.139
172.217.194.139 via 192.168.1.245 dev bond1 src 192.168.1.11
cache
[root@node01 ~]# ifconfig bond1
bond1: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500
inet 192.168.1.11 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::9618:82ff:fe86:c9a8 prefixlen 64 scopeid 0x20<link>
ether 94:18:82:86:c9:a8 txqueuelen 1000 (Ethernet)
RX packets 12849386 bytes 1411245617 (1.3 GiB)
RX errors 123 dropped 1159 overruns 59 frame 0
TX packets 3385895 bytes 531501631 (506.8 MiB)
TX errors 321 dropped 0 overruns 0 carrier 0 collisions 0
Based on the found timestampts of packet loss and errors statistic of the interface investigate the issues further from the network side in your infrastructure.