Virtual Machine Troubleshooting
|Before we jump into troubleshooting virtual machines, let’s review some of the typical virtual machine files you will run into.
Virtual Machine Troubleshooting Scenario #1 – Content ID Mismatch
One of the most frustrating issues that comes up with VMs can be snapshots. In fact, our first few virtual machine troubleshooting scenarios will be focused on snapshots. Occasionally you will receive errors that return a content ID mismatch error like the one below.
Cannot open the disk ‘/vmfs/volumes/4a496b4g-eceda1-19-542b-000cfc0097g5/virtualmachine/virtualmachine-000002.vmdk’ or one of the snapshot disks it depends on. Reason: The parent virtual disk has been modified since the child was created.
Content ID mismatch conditions are triggered by interruptions to major virtual machine migrations such as Storage vMotion or Migration, VMware software error, or user action.
The Content ID (CID) value of a virtual machine disk descriptor file aids in the goal of ensuring content in a parent virtual disk file, such as a flat or base disk, is retained in a consistent state. The child delta disks that derive from that base disk’s snapshot contain all further writes and changes. These changes depend on the source disk to remain intact.
To resolve, open the latest vmware.log and locate the specific disk chain affected. You will see a line or warning that is similar to Content ID mismatch (parentCID ed06b3ce != 0cb205b1)
In our case change the parentCID in the disk descriptor file from ed06b3ce to 0cb205b1. Then overwrite the existing vmdk file and power the machine back on.
Virtual Machine Troubleshooting Scenario #2 – Snapshot Issues
Taking a snapshot of a machine fails. The user cannot create or commit a snapshot to the VM. Typical errors will say something like:
- Cannot create a quiesced snapshot because the snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine.
- An error occurred while quiescing the virtual machine. The error code was: 4 The error was: Quiesce aborted.
Quiescing is done by two technologies.
- Microsoft Volume Shadow Copy Service
- VMware Tools Sync Driver
When taking snapshots be sure the following occur:
- VSS prerequisites are met. (See VMware KB 1007696)
- VMware Tools is running.
- The VSS provider is used.
- All the VSS writers are not showing errors.
When taking snapshots, be sure you do not reach 32 levels. If you have more than 32, you cannot create more snapshots. Generally, it’s a recommended practice to keep as little of snapshots as possible on a virtual machine. They can be a performance hit and difficult to troubleshoot.
If a snapshot creation also fails, check that the user has permissions to take a snapshot. Then check that the disk is also supported. RDMs in physical mode, independent disks or VMs with bus-sharing are not supported.
Snapshots will grow based on delta files. You cannot create or commit a snapshot if a snapshot (delta) does not have a descriptor file.
|Additional Snapshot Machine Files|
|<vm name>-00000n-delta.vmdk||A delta vmdk is created whenever a snapshot is taken. The pre-snapshot vmdk in use is locked for writing. Any changes from there on are written to the vm’s delta disk. This allows a vm to be restored to any state prior to a specific snapshot being taken.|
|<vm name>-00000n.vmdk||The descriptor file for the delta vmdk file.|
If the –delta.vmdk has no descriptor file, you will need to create one before doing anything:
- Copy the base disk descriptor file, use the name of the missing descriptor file.
- Edit the new descriptor file. Change the format from a base disk to a snapshot delta disk descriptor.
Another possible issue that might arise when troubleshooting snapshots could be insufficient space on a datastore to commit all the snapshots. Be sure to check the Summary tab of your datastore or run the command “df -h” to determine if you have enough space. You’ll need to increase the size of a the datastore or move virtual machines to other datastores with enough space.
Virtual Machine Troubleshooting Scenario #3 – Virtual Machine Power On Issues
Typically when a virtual machine does not power on, it is recommended to start by creating a test virtual machine and power it on. Does the test VM power on successfully? If the test VM did not power on, check your ESXi host resources to make sure sufficient resources exist and that the host is responsive. If the test VM does power on, that indicates it is more than likely an individual virtual machine issue with the virtual machine. Log files is your best place to start from there.
Browse to the location of the VM and determine that all the virtual machine files are there. Look for vmx, vmdks, etc. Restore the file if you see anything missing.
A virtual machine will also not power on if one of the virtual machine’s files is locked.
Perform these steps to find a locked file:
- Power on a virtual machine.
- If the power-on fails, look for the affected file.
- Determine whether the file can be locked.
- touch filename
- Determine which ESXi host has locked the file.
- vmkfstools -D /vmfs/volumes/Shared/VM02/VM02-flat.vmdk
- Check the MAC address at the location (See below) in the output.
- If you see all zeros for the owner that means the owner is the current ESXi server.
- Login to the host that has the locked file and identify the process.
- Kill the process that is locking the file.
Virtual Machine Troubleshooting Scenario #4 – Orphaned Virtual Machines
When a virtual machine is orphaned, you should begin by trying to determine if a vCenter reboot has occurred. Occasionally if you try to move a machine through a vMotion migration to another host and the vCenter is rebooted it can cause them to be orphaned. Virtual machines can become orphaned if a host failover is unsuccessful, or when the virtual machine is unregistered directly on the host. Some additional symptoms:
- Virtual Machines show as invalid or orphaned after a VMware High Availability (VMware HA) host failure occurs
- Virtual Machines show as invalid or orphaned after an ESX host comes out of maintenance mode
- Virtual Machines show as invalid or orphaned after a failed DRS migration
- Virtual Machines show as invalid or orphaned after a storage failure
- Virtual Machines show as invalid or orphaned after the connection is lost between the vCenter Server and the host where the virtual machine resides
To fix, follow the steps below:
- Determine the datastore where the virtual machine configuration (.vmx) file is located.
- Return to the virtual machine in the vSphere Web Client, right-click, and select:
- All Virtual Infrastructure Actions> Remove from Inventory.
- Click Yes to confirm the removal of the virtual machine.
If you were looking to recreate and not just remove the virtual machine try the following:
- Browse to the datastore and verify that the virtual machine files exist.
- If the vmx configuration file was deleted or removed and the disk files are still there, attach the old disk files to a newly created machine.
- If the disk files were deleted, restore from a backup.
Next post in this series: http://www.ryanbirk.com/vsphere-troubleshooting-series-part-5-storage-troubleshooting/