65091: Acronis Cyber Infrastructure: 'No internet connection' alert

use Google Translate

Last update: 24-09-2020

Symptoms

You see this alert in WebCP > Monitoring > Alerts:

No internet connection on the node <node>.

Cause

A background periodic task is running on each cluster node verifying internet connection and domain names resolving by checking connection with google.com domain name via ping utility. The "No internet connection" alert appears in case of 100% packages loss.

Solution

This article explains how to investigate the issue with Acronis Cyber Infrastructure logs.

The following messages in /var/log/vstorage-ui-agent/tasks.log are representing network connection check, example:

DEBUG 2020-05-25 17:04:44,280 t-555decc622e34515 agent/business/models/runner.py:34:CmdRunner._internal_checked_execute command "('/usr/bin/ping', '-c', '4', 'google.com')" executed (c: 0, o: PING google.com (172.217.194.139) 56(84) bytes of data.
64 bytes from 172.217.194.139 (172.217.194.139): icmp_seq=1 ttl=41 time=12.9 ms
64 bytes from 172.217.194.139 (172.217.194.139): icmp_seq=2 ttl=41 time=15.9 ms
64 bytes from 172.217.194.139 (172.217.194.139): icmp_seq=3 ttl=41 time=34.0 ms
64 bytes from 172.217.194.139 (172.217.194.139): icmp_seq=4 ttl=41 time=17.6 ms
--- google.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms rtt min/avg/max/mdev = 12.937/20.130/34.037/8.202 ms , e: )

The following command can be used to check detected packet loss on the affected node:

# egrep 'transmitted.*received.*loss' /var/log/vstorage-ui-agent/tasks.log | grep -v '4 received' | sort | uniq

Example output:

[root@node01 ~]# egrep 'transmitted.*received.*loss' /var/log/vstorage-ui-agent/tasks.log | grep -v '4 received' | sort | uniq
4 packets transmitted, 0 received, 100% packet loss, time 2999ms
4 packets transmitted, 3 received, 25% packet loss, time 3002ms
4 packets transmitted, 3 received, 25% packet loss, time 3003ms
4 packets transmitted, 3 received, 25% packet loss, time 3004ms

When packet loss is detected, find out the exact timeframes:

# grep -A3 'has_internet_connection.*err.*PING' /var/log/vstorage-ui-agent/tasks.log

Example output:

[root@node01 ~]# grep -A3 'has_internet_connection.*err.*PING' /var/log/vstorage-ui-agent/tasks.log
WARNING 2020-05-25 18:39:53,962 t-8ed9370e97914a01 agent/business/models/utils.py:372:has_internet_connection status: 1 err: PING google.com (172.217.194.139) 56(84) bytes of data.
--- google.com ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms
--
WARNING 2020-05-26 03:54:53,482 t-d2d30f097dea4fb5 agent/business/models/utils.py:372:has_internet_connection status: 1 err: PING google.com (172.217.194.113) 56(84) bytes of data.
--- google.com ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms

Check which interface is used for connection with Internet (IP 172.217.194.139 in this example) and verify if there are TX/RX errors detected on it:

[root@node01 ~]# ip r g 172.217.194.139
172.217.194.139 via 192.168.1.245 dev bond1 src 192.168.1.11
    cache

[root@node01 ~]# ifconfig bond1
bond1: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500
        inet 192.168.1.11  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::9618:82ff:fe86:c9a8  prefixlen 64  scopeid 0x20<link>
        ether 94:18:82:86:c9:a8  txqueuelen 1000  (Ethernet)
        RX packets 12849386 bytes 1411245617 (1.3 GiB)
        RX errors 123  dropped 1159  overruns 59  frame 0
        TX packets 3385895 bytes 531501631 (506.8 MiB)
        TX errors 321  dropped 0 overruns 0  carrier 0  collisions 0

Based on the found timestampts of packet loss and errors statistic of the interface investigate the issues further from the network side in your infrastructure.

Tags: