In vSphere, networking problems can occur at many different levels. It is important to know which level to start with. Is it a virtual machine problem or a host problem? Did the issue arise when you migrated the machine to a new host?
- Virtual switch connectivity can be managed in two ways:
- Standard switches
- Distributed switches
You also must determine if it’s a virtual machine or a host management issue.
Network Troubleshooting Scenario #1 – No network connectivity to other systems.
One of the first things you need to do is a simple ping. Ping a system that is up and that you have tested and should be accessible to the ESXi host.
Starting at the ESXi host, verify these possible configuration problems:
- Does the ESXi host network configuration appear correct? IP, subnet mask, gateway?
- Is the uplink plugged in? Yes, that had to be said.
- esxcli network nic list
- If using VLANs, does the VLAD ID of the port group look correct?
- esxcli network vswitch standard portgroup list
- Check the trunk port configuration on the switch. Have there been any recent changes?
- Does the physical uplink adapter have all settings configured properly? (speed, duplex, etc.)
- vicfg-nics –d duplex -s speed vmnic#
- If using NIC teaming, is it setup and configured properly?
- Are you using supported hardware? Any driver issues?
- If all of the above test ok, check that you don’t have a physical adapter failure.
If you recently moved the VM to a new host, also verify that an equivalent port group exists on the host and that the network adapter is connected in the virtual machine settings. The firewall in the guest operating system might be blocking traffic. Ensure that the firewall does not block required ports.
Network Troubleshooting Scenario #2 – ESXi hosts dropping from vCenter
Occasionally an ESXi host is added to the vCenter Server inventory with no issues at all, but disconnects 60 seconds after the task completes.
Typically, this issue is because of lost heartbeat packets between vCenter (vpxd) and an ESXi host (vpxa).
The first thing you should check is that no firewall is in place blocking the vCenter communication ports. Then verify that network congestion is not occurring on the network. This issue is more prevalent with Windows based vCenter systems.
Adjust the Windows Firewall settings:
- If ports are not configured, disable Windows Firewall.
- If the firewall is configured with the proper ports, ensure that Windows Firewall is not blocking UDP port 902.
By default vpxa uses UDP port 902, but it is possible to change the ports to something else. Check the /etc/vmware/vpxa/vpxa.cfg file <ServerPort> setting.
When it comes to network congestion, dropped heartbeats can happen as well. Some tools you can use to troubleshoot:
- You can use the resxtop utility or graphical views to analyze traffic.
- The pktcap-uw command is an enhanced packet capture and analysis tool.
- pktcap is unidirectional and defaults to inbound direction only.
- Direction of traffic is specified using –dir 0 for inbound and –dir 1 for outbound.
- Two (or more) separate traces can be run in parallel but need to be merged later in wireshark.
Network Troubleshooting Scenario #3 – No Management Connectivity on ESXi Host
VMware Management networks are configured using VMkernel port groups. Typically, when a host loses connectivity to vCenter and was working prior, a recent change to that port group has caused the issue.
One feature VMware has, which helps in this case is the Rollback feature. Several different types of events can trigger a network rollback:
- Updating the speed or duplex of a physical NIC
- Updating teaming and failover for the management VMkernel adapter
- Updating DNS and routing settings on the ESXi host
- Changing the IP settings of a management VMkernel adapter
If any of the above are changed and it fails, the host rolls back to the last known good configuration.
You can also restore the network configuration from the DCUI. Select “Network Restore Options” and you can select to restore either standard switches or distributed switches. The Restore Network Settings option deletes all the current network settings except for the Management network if you’re looking to start with a new configuration.