vSphere Troubleshooting Series: Part 1 – Introduction

vSphere Troubleshooting Introduction

Before we begin, we need to start off with an introduction to a few things that will make life easier. We’ll start with a troubleshooting methodology and how to gather logs. After that, we’ll break this eBook into the following sections: Installation, Virtual Machines, Networking, Storage, vCenter/ESXi and Clustering.

ESXi and vSphere problems arise from many different places, but they generally fall into one of these categories:

  • Hardware issues
  • Resource contention
  • Network attacks
  • Software bugs
  • Configuration problems

A typical troubleshooting process contains several tasks:

  1. Define the problem and gather information.
  2. Identify what is causing the problem.
  3. Fix the problem, implement a fix.

One of the first things you should try to do when experiencing a problem with a host, is try to reproduce the issue. If you can find a way to reproduce it, you have a great way to validate that the issue is resolved when you do fix it. It can be helpful as well to take a benchmark of your systems before they are implemented into a production environment. If you know HOW they should be running, it’s easier to pinpoint a problem.

You should decide if it’s best to work from a “Top Down” or “Bottom Up” approach to determine the root cause. Guest OS Level issues typically cause a large amount of problems. Let’s face it, some of the applications we use are not perfect. They get the job done but they utilize a lot of memory doing it.

In terms of virtual machine level issues, is it possible that you could have a limit or share value that’s misconfigured?

At the ESXi Host Level, you could need additional resources. It’s hard to believe sometimes, but you might need another host to help with load!

Once you have identified the root cause, you should assess the impact of the problem on your day to day operations. When and what type of fix should you implement? A short-term one or a long-term solution? Assess the impact of your solution on daily operations.

  • Short-term solution: Implement a quick workaround.
  • Long-term solution: Reconfiguration of a virtual machine or host.

Next in this series: vSphere Troubleshooting Series: Part 2 – vSphere Troubleshooting Tools

Posted in Troubleshooting, vSphere | Leave a comment

DRS Cluster Management with Reservation and Shares

Reservation and shares are important resource management settings in a vSphere DRS cluster. They can be set on cluster objects like VMs and resource pools to isolate resources, prioritize, and/or guarantee their availability.

To know when to set them for VMs and when to set them on resource pools, we need to understand:

  • What these settings mean.
  • How these settings can impact resource availability for a VM.

In this paper, VMware explains how these settings are different for a VM and resource pool while giving some general guidelines for using them.

drs-cluster-mgmt-perf.pdf

Posted in vSphere, Whitepapers | Leave a comment

vSAN 6.6 download links are publicly available.

Today VMware has released vSAN 6.6 to the public! 6.6 is a big step forward. While the base architecture of vSAN stays the same, it becomes more and more efficient. With the price of flash going down, it’s an even more compelling reason to look into HCI these days.

One of my favorite use cases for vSAN is with VMware Horizon Advanced/Enterprise editions. To me, it’s a no brainer. Simple, elastic scalability, much higher performance over traditional tiered storage and lower TCO. Not long ago storage in VDI environments would have given most of us nightmares.

What’s New

  • Unicast – In vSAN 6.6, cluster communication has been redesigned to use unicast traffic. Multicast is no longer required on the physical switches to support the vSAN cluster.
  • Enhanced Stretched Clusters with Local Failure Protection – Previously, vSAN was able to provide a fully active-active, stretched cluster. vSAN 6.6 takes this a step further, allowing for storage redundancy within a site AND across sites at the same time.
  • Encryption – vSAN supports data-at-rest encryption of the vSAN datastore. When encryption is enabled, vSAN performs a rolling reformat of every disk group in the cluster.
  • Site Affinity for Stretched Clusters – A new feature for vSAN 6.6 Stretched Clusters is the ability to configure site affinity.
  • Configuration Assist and Updates – New Configuration Assist and Updates pages allows to check the configuration of your vSAN cluster, and resolve any issues.
  • Resynchronization throttling – IOPS used for cluster resynchronization can be throttled to prevent performance bottlenecks.
  • vSAN Health Command Line Tool – A new esxcli command allows to check vSAN health from the command line (esxcli vsan health).
  • Degraded Device Handling – vSAN 6.6 provides a more proactively stable environment with the detection of degraded and failing devices.

Additional Information

Posted in vSAN, vSphere 6.5 | Leave a comment

New Webinar: VMware Administrator’s Troubleshooting Guide

Altaro is a great blog sponsor of mine and reached out to me recently to see if there was interest in doing a separate project with them. A vSphere troubleshooting webinar. I’ve compiled a list of some of the most popular scenarios and put them together in an eBook. We’ll be discussing them April 25th! You can register at the link below.


Troubleshooting complex virtualization technology is something that system administrators have to face at some point, and it’s not always an easy fix to get things up and running again. We’re running a webinar that will cover the most common problems experiences in VMware vSphere. Register for the webinar here.

Andy Syrewcize and Ryan Birk will bring a wealth of experience in troubleshooting some of the most common issues admins face. Andy has spent the last 12 years providing technology solutions across a variety of industries, and has experience in troubleshooting VMware infrastructures for education, manufacturing, healthcare and other industries. Ryan is a VMware vExpert, and a VMware certified trainer, having consulted and engineered infrastructures for a variety of companies.

Here’s what you’ll learn:

  • Troubleshooting some of the most common vSphere problems
  • Quick and efficient issue fixing practices
  • Maintaining a smooth running vSphere environment to avoid future issues

vmware-troubleshooting-admin-guide

Date: Tuesday April 25th, 2017
Time: For US registrants: 10am PDT/1pm EDT, for RoW registrants: 2pm CET

save-a-seat-webinar-1-300x54

PS: We’re releasing an eBook by Ryan on the same topic soon as well. By registering for the webinar you will get early access automatically!

Posted in Troubleshooting, Webinars | Leave a comment

vSAN Troubleshooting Tools

Before you begin it is important to realize that vSAN is a software based storage product that is entirely dependent on the proper functioning underlying hardware components, like network, storage I/O controller and the individual storage devices. You always need to follow the vSAN Compatibility Guide for all deployments. I often play with things in my homelab, just because you CAN do it, doesn’t mean it’s supported by VMware. 🙂

I’ve recently been playing with stretched clusters in my homelab and have finally had to fix some of the stuff I’ve been playing with. The beauty of the homelab, is I can break anything I want! I had not worked with the Ruby Console at all but it’s an amazing tool to check out.

Many vSAN errors can be traced back to faulty VMkernel ports, mismatched MTU sizes, etc. It’s far more than simple TCP/IP.

Some of the tools you can use to troubleshoot vSAN are:

  • vSphere Web Client
    • The primary tool to troubleshoot vSAN.
    • Provides overviews of individual virtual machine performance.
    • Can inspect underlying disk devices and how they are being used by vSAN.
  • esxcli vsan
    • Get information and manage the vSAN cluster.
    • Clear vSAN network configuration.
    • Verify which VMkernel network adapters are used for vSAN communication.
    • List the vSAN storage configuration.
  • Ruby vSphere Console
    • Fully implemented since vSphere 5.5
    • Commands to apply licenses, check limits, check state, change auto claim mechanisms, etc.
    • See this link for a great introduction to RVC.
  • vSAN Observer
    • This tool is included within the Ruby vSphere Console.
    • Can be used for performance troubleshooting and examined from many different metrics like CPU, Memory or disks.
  • Third Party Tools

I also highly recommend checking out the vSAN troubleshooting whitepaper the Cormac Hogan has written:

vsan-troubleshooting-reference-manual.pdf

Posted in Troubleshooting, vSAN | Leave a comment

vCenter Server HA Deployment/Design Considerations

Check out fellow Indy VMUGer, Adam Eckerle and his great 13 minute vCenter HA discussion on vSphere 6.5. I love the whiteboard Adam!

Posted in HA, vSphere 6.5 | Leave a comment