This part of the series will cover some of the common issues with vSphere deployments. We will split this section into two sections. The first will cover ESXi host troubleshooting during installation, the second will cover vCenter deployments at installation.
ESXi Host Troubleshooting
It’s common to think that ESXi will just install on any hardware, but it’s important to know a few details before you decide to get started. First, VMware only will support hardware that is officially supported on the VMware Hardware Compatibility List. Specific drivers are tested and chosen. If it’s not on the list, don’t expect support. VMware has a large partner eco-system and both hardware and software goes through rigorous testing and is signed off on for official support.
VMware also has various community driver support. What this means is that even though your hardware can work with ESXi, it’s not running in a fully supported mode. This is a nice feature for users who build homelabs for practice.
Another important note to remember during installation is that not all of your drivers might install automatically. It’s possible that your hardware could be newer and you might have to download a vSphere Installation Bundle, also called a VIB. A VIB is somewhat like a tarball or ZIP archive in that it is a collection of files packaged into a single archive to make software deployments easier.
A VIB is comprised of three parts:
- A file archive
- An XML descriptor file
- A signature file
The signature file is the electronic signature used to verify the level of trust. The trust level will be one of the four listed below:
- VMwareCertified: VIBs created and tested by VMware. VMware Certified VIBs undergo thorough testing by VMware.
- VMwareAccepted: VIBs created by a VMware partners that are approved by VMware. VMware relies on partners to perform the testing, but VMware verifies the results.
- PartnerSupported: VIBs created and tested by a trusted VMware partner. The partner performs all testing. VMware does not verify the results.
- CommunitySupported: VIBs created by individuals or partners outside of the VMware partner program. These VIBs do not undergo any VMware or trusted partner testing and are not supported by VMware or its partners.
If installation was successful and you have all the right VIBs and software configured, but other issues have come up, you should always check the hostd.log file first. The hostd management service is the main communication channel between ESXi hosts and VMkernel. If hostd fails, the ESXi host disconnects from vCenter and cannot be easily managed.
- Try restarting hostd by running /etc/init.d/hostd restart
Occasionally, an ESXi host will crash and display a purple diagnostic screen. A host can crash for several reasons. CPU exceptions, driver issues, machine check exceptions, hardware fault or a software bug.
To recover from a PSOD, you should try following these steps:
- Take a screenshot of the screen.
- Restart the host, get the VMs up and running on another host if possible. If using HA, this should happen on its own if configured properly.
- Contact VMware support if you can’t find any information online. Occasionally others have the same issue and the fix can be implemented easily through firmware or software updates.
Another possible issue is that the ESXi host simply hangs during the boot process. You never get a PSOD, it just sits there and the entire system becomes unresponsive. Typically hangs happen during a power cycle of a system during the boot process. It’s caused by VMkernel being too busy or a possible hardware lockup.
To determine that the host has locked up:
- Ping the VMkernel (Management) network interface.
- Try to login to the host with the client.
- Monitor network traffic from the ESXi host.
If you can ping the host, that’s a good sign. Next connect to the DCUI to display any messages on the screen. Press Alt-F12 at the host console to do that.
To recover from a host that has hung, try rebooting the ESXi host, review logs and gather performance statistics. If you determine it’s a hardware issue, fix the hardware and if required reinstall or reconfigure ESXi. Lastly update the host with the most recent patches.
When installing the vCSA, VMware has split the install into two different stages. Stage 1 is the appliance deployment. Stage 2 is the configuration of the appliance.
Stage 1 in most cases, is a very straightforward install. Stage 2 is where traditionally, users have had issues with deployments and it generally can be resolved with verifying your DNS settings and NTP.
Some deployments seem successful but upon login, the authentication fails if the NTP server on the ESXi host and the newly created appliance are not synced to the same source.
Occasionally you might run into issues replacing certificates with the Certificate Manager. It can hang at 0% and perform an automatic rollback error. This issue can be caused by using non-Base64 certificates. To resolve, manually publish the full chain to the certificate store.
Nest post in this series: vSphere Troubleshooting Part #4 – Virtual Machine Troubleshooting