Altaro VM Backup v8 has been released.

I’ve got Altaro running in my home lab and decided to upgrade recently to the newest version. I wanted to highlight a few updates regarding version 8!

The interface is still the same for the most part. It’s easy to understand, and intuitive.

My friend, Andy Syrewicze over at Altaro has a great What’s New video:

Drastically reduce RTO.

“WAN-Optimized Replication allows businesses to continue accessing and working in the case of damage to their on-premise servers. If their office building is hit by a hurricane and experiences flooding, for instance, they can continue working from their VMs that have been replicated to an offsite location,” explained David Vella, CEO and co-founder of Altaro Software.

“As these are continually updated with changes, businesses using Altaro VM Backup can continue working without a glitch, with minimal to no data loss, and with an excellent recovery time objective, or RTO.”

Centralized, multi-tenant view for MSPs.

Managed Service Providers (MSPs) can now add replication services to their offering, with the ability to replicate customer data to the MSP’s infrastructure. This way, if a customer site goes down, that customer can immediately access its VMs through the MSP’s infrastructure and continue working.

With Altaro VM Backup for MSPs, MSPs can manage their customer accounts through a multi-tenant online console for greater ease, speed and efficiency, enabling them to provide their customers with a better, faster service.

You can check more information out on the Altaro page.

Posted in Backup, Tech Reviews | Leave a comment

What’s New in vSphere 6.7: Whitepaper

VMware vSphere 6.7 delivers key capabilities to enable IT organizations to address the following notable trends that are putting new demands on their IT infrastructure:

  • Explosive growth in quantity and variety of applications, from business-critical applications to new intelligent workloads
  • Rapid increase in hybrid cloud environments and use cases
  • Global expansion of on-premises data centers, including at the edge
  • Heightened importance of security relating to infrastructure and applications

Download the Technical White Paper: What’s New in vSphere 6.7

Posted in vSphere, Whitepapers | Leave a comment

Altaro VM Backup: 7.6 Review

Hi everyone. I wanted to get a quick post out there about one of my blog sponsors, Altaro. They’re a great partner of mine and I also happen to write content for their VMware blog over here. With that little tidbit out of the way, lets get to the good stuff. They have released a new 7.6 version and I thought I’d writer a bit about some of my favorite new features.

  • With Altaro VM Backup 7.6, users can switch from running daily backups to a continuous data protection model yielding an improved Recovery Point Objective (RPO) of up to 5 minutes.
  • Altaro VM Backup 7.6 introduces GFS (Grandfather-Father-Son (GFS) Archiving), enabling users to choose to archive the backup versions over and above their continuous and daily backups instead of deleting them (local backups only). Now you can easily set up separate backup cycles to store a new backup version every week, every month and every year.
  • In previous Altaro VM Backup Versions only one operation could be performed on a Virtual Machine at the same time. This caused the following pain points:
    • If a retention policy takes quite long to complete, then backups and restore operations are queued until retention is complete.
    • If an Offsite Copy to Azure takes days to complete, especially for the initial backup; then backups and restore operations for that VM are queued until it is complete
    • If a Restore, File Level Restore or Boot from Backup operation is active then no backups for that Virtual Machine could take place until they are completed.

    Each of these limitations have been addressed in v7.6 , allowing users to restore and take Offsite Copies without delaying any scheduled or CDP backups whether scheduled or CDP.

Altaro is still very competitive on price for the feature set you get. Per host, unlimited sockets. You can check their pricing calculator here and see for yourself.

I would also recommend checking out the video Andy Syrewicze did that demos some of the new v7 features. Myself, I find the interface very easy to use and to setup. I had no trouble navigating the client and setting things up without having to read the entire manual. 🙂 Within 15 minutes I had multiple hosts and VMs setup and backup jobs running. I have also tested the restore and sandbox functions and they have worked each time I have tried it. My upgrade was very smooth as well.

Posted in Backup | Leave a comment

vSphere Troubleshooting Series: Part 6 – Network Troubleshooting

In vSphere, networking problems can occur at many different levels. It is important to know which level to start with. Is it a virtual machine problem or a host problem? Did the issue arise when you migrated the machine to a new host?

  • Virtual switch connectivity can be managed in two ways:
    • Standard switches
    • Distributed switches

You also must determine if it’s a virtual machine or a host management issue.

Network Troubleshooting Scenario #1 – No network connectivity to other systems.

One of the first things you need to do is a simple ping. Ping a system that is up and that you have tested and should be accessible to the ESXi host.

Starting at the ESXi host, verify these possible configuration problems:

  • Does the ESXi host network configuration appear correct? IP, subnet mask, gateway?
  • Is the uplink plugged in? Yes, that had to be said.
    • esxcli network nic list
  • If using VLANs, does the VLAD ID of the port group look correct?
    • esxcli network vswitch standard portgroup list
  • Check the trunk port configuration on the switch. Have there been any recent changes?
  • Does the physical uplink adapter have all settings configured properly? (speed, duplex, etc.)
    • vicfg-nics –d duplex -s speed vmnic#
  • If using NIC teaming, is it setup and configured properly?
  • Are you using supported hardware? Any driver issues?
  • If all of the above test ok, check that you don’t have a physical adapter failure.

If you recently moved the VM to a new host, also verify that an equivalent port group exists on the host and that the network adapter is connected in the virtual machine settings. The firewall in the guest operating system might be blocking traffic. Ensure that the firewall does not block required ports.

Network Troubleshooting Scenario #2 – ESXi hosts dropping from vCenter

Occasionally an ESXi host is added to the vCenter Server inventory with no issues at all, but disconnects 60 seconds after the task completes.

Typically, this issue is because of lost heartbeat packets between vCenter (vpxd) and an ESXi host (vpxa).

The first thing you should check is that no firewall is in place blocking the vCenter communication ports. Then verify that network congestion is not occurring on the network. This issue is more prevalent with Windows based vCenter systems.

Adjust the Windows Firewall settings:

  • If ports are not configured, disable Windows Firewall.
  • If the firewall is configured with the proper ports, ensure that Windows Firewall is not blocking UDP port 902.

By default vpxa uses UDP port 902, but it is possible to change the ports to something else. Check the /etc/vmware/vpxa/vpxa.cfg file <ServerPort> setting.

When it comes to network congestion, dropped heartbeats can happen as well. Some tools you can use to troubleshoot:

  • You can use the resxtop utility or graphical views to analyze traffic.
  • The pktcap-uw command is an enhanced packet capture and analysis tool.
    • pktcap is unidirectional and defaults to inbound direction only.
    • Direction of traffic is specified using –dir 0 for inbound and –dir 1 for outbound.
    • Two (or more) separate traces can be run in parallel but need to be merged later in wireshark.
  • Wireshark

Network Troubleshooting Scenario #3 – No Management Connectivity on ESXi Host

VMware Management networks are configured using VMkernel port groups. Typically, when a host loses connectivity to vCenter and was working prior, a recent change to that port group has caused the issue.

One feature VMware has, which helps in this case is the Rollback feature. Several different types of events can trigger a network rollback:

  • Updating the speed or duplex of a physical NIC
  • Updating teaming and failover for the management VMkernel adapter
  • Updating DNS and routing settings on the ESXi host
  • Changing the IP settings of a management VMkernel adapter

If any of the above are changed and it fails, the host rolls back to the last known good configuration.

You can also restore the network configuration from the DCUI. Select “Network Restore Options” and you can select to restore either standard switches or distributed switches. The Restore Network Settings option deletes all the current network settings except for the Management network if you’re looking to start with a new configuration.

Posted in Troubleshooting | Leave a comment

Altaro Webinar: 5 Performance-boosting vSphere Features You’re Missing Out On

Altaro is a blog sponsor of mine and occasionally I work with them on VMware webinars. We’d be happy to have you join us!

When: Tuesday Sep 19th 2017

Are you running your vSphere environment to its full potential? Have you overlooked features you already have access to but didn’t know could make a major difference?

Not sure?

Many organizations make use of only the most commonly used features in vSphere such as vMotion, HA, and DRS, but there many ways to get more performance out of your setup. Even if you’re part of a small or medium-sized organization, these performance-boosters can significantly enhance your IT productivity.

This is also not to mention that you’ll likely want to fully leverage your investment in the vSphere platform. You wouldn’t buy a supercar and only stay in first gear, would you?

With that idea in mind, join us for our upcoming webinar, and learn from VMware vExperts Ryan Birk and Andy Syrewicze who will show you how to use some of the lesser-known features and capabilities of vSphere to unleash your full potential.

In this webinar you’ll learn about:

  • How to leverage the full power of vSphere
  • Lesser known features that can bring great benefits
  • Best practices for the features mentioned

At the end of this session we’ll also run a Q & A on the topic where you can ask Ryan and Andy your questions!

For more info and to register, check out the registration link:

Posted in Webinars | Leave a comment

vSphere Troubleshooting Series: Part 5 – Storage Troubleshooting

If a virtual machine cannot access its virtual disks, the cause of the problem might be anywhere from the virtual machine to physical storage.

As you can see below, there are multiple types of storage, it’s important to determine what type you’re troubleshooting before starting. A “datastore” can be multiple things with different types of connectivity.

Storage Troubleshooting Scenario #1 – Storage is not reachable from ESXi host.

This problem typically will be noticed when a datastore falls offline. The ESXi host itself appears fine but something has caused the datastore to fall offline.

Typically, the best method to start with would be:

  • Verify that the ESXi host can see the LUN by running: “esxcli storage core path list” from the host.
  • Check to see if a rescan of storage resolves it by running: “esxcli storage core adapter rescan -A vmhba##”

If the rescan does not resolve it, it is likely that something else is causing the issue. Have there been any other recent changes to the ESXi host?

Some other possible causes:

A firewall is not blocking traffic. VMkernel interface is misconfigured
IP Storage is not properly configured NFS storage is not configured properly
iSCSI port (3260) is unreachable The actual storage device is functioning ok.
LUN masking in place? Is the LUN still presented? Check to see if the array is supported.

Check your adapter settings. Are the network port bindings setup properly? Is the target name spelled properly? Is the initiator name correct? Are there any required CHAP settings needed? Do you see your storage devices under the devices tab?

If the storage device is online but functioning poorly, check your latency metrics as well. Your goal is to not oversubscribe your links. Try to isolate iSCSI and NFS.

Use the esxtop or resxtop command and press d:

Column  Description
CMDS/s This is the total amount of commands per second and includes IOPS (Input/Output Operations Per Second) and other SCSI commands such as SCSI reservations, locks, vendor string requests, unit attention commands etc. being sent to or coming from the device or virtual machine being monitored.
DAVG/cmd This is the average response time in milliseconds per command being sent to the device.
KAVG/cmd This is the amount of time the command spends in the VMkernel.
GAVG/cmd This is the response time as it is perceived by the guest operating system.

Storage Troubleshooting Scenario #2 – NFS Connectivity Issues

If you have virtual machines that are on NFS datastores, verify that the configuration is correct.

  • Check NFS server name or IP address.
  • Is the ESXi host mapped to the virtual switch?
  • Does the VMkernel port have the right IP configuration?
  • On the NFS server, are the ACLs correct? (read/write or read only)

VMware supports both NFS v3 and v4.1, but it’s important to remember that that use different locking mechanisms:

  • NFS v3 uses proprietary client-side cooperative locking.
  • NFS v4.1 uses server-side locking.

Configure an NFS array to allow only one NFS protocol. Use either NFS v3 or NFS v4.1 to mount the same NFS share across all ESXi hosts. It is not a good idea to mix. Data corruption might occur if they try to access the same NFS share with different client versions.

NFS 4.1 also does not currently support Storage DRS, vSphere Storage I/O Control, Site Recovery Manager or Virtual Volumes.

Storage Troubleshooting Scenario #3 – One or more paths to a LUN is lost.

If an ESXi host at one point had storage connectivity but the LUN is now dead, here are a few esxcli commands to run to use when troubleshooting this issue.

  • esxcli storage core path list

  • esxcli storage nmp device list

A path to a storage/LUN device can be marked as Dead in these situations:

  • The ESXi storage stack determines a path is Dead due to the TEST_UNIT_READY command failing on probing
  • The ESXi storage stack marks paths as Dead after a permanent device loss (PDL)
  • The ESXi storage stack receives a Host Status of 0x1 from an HBA driver

For iSCSI storage, verify that NIC teaming is not misconfigured. Next verify your path selection policy is setup properly.

Check for Permanent Device Loss or All Paths Down. There are two distinct states a device can be in when storage connectivity is lost; All Paths Down or Permanent Device Loss. For each of these states, the device is handled is different. All Paths Down (APD) is a condition where all paths to the storage device are lost or the storage device is removed. The state is caused because the change happened in an uncontrolled manner, and the VMkernel storage stack does not know how long the loss of access to the device will last. The APD is a condition that is treated as temporary (transient), since the storage device might come back online; or it could be permanent, which is referred to as a Permanent Device Loss (PDL).

Permanent Device Loss (PDL):

  • A datastore is shown as unavailable in the Storage view
  • A storage adapter indicates the Operational State of the device as Lost Communication
  • A planned PDL occurs when there is an intent to remove a device presented to the ESXi host.
  • An unplanned PDL occurs when the storage device is unexpectedly unpresented from the storage array without the unmount and detach being ran on the ESXi host.

All Paths Down (APD):

  • You are unable to connect directly to the ESXi host using the vSphere Client
  • All paths to the device are marked as Dead
  • The ESXi host shows as Disconnected in vCenter Server

The storage all paths down (APD) handling on the ESXi host is enabled by default. When it is enabled, the host continues to retry nonvirtual machine I/O commands to a storage device in the APD state for a limited time frame. When the time frame expires, the host stops the retry attempts and terminates any nonvirtual machine I/O. You can disable the APD handling feature on your host. If you disable the APD handling, the host will indefinitely continue to retry issued commands to reconnect to the APD device. If you disable it, it’s possible that the host could exceed their internal I/O timeout and become unresponsive.

You might want to increase the value of the timeout if there are storage devices connected to your ESXi host which might take longer than 140 seconds to recover from a connection loss. You can enter a value between 20 and 99999 seconds for the Misc.APOTimeout value.

  • Browse to the host in the vSphere Web Client.
  • Click the Manage tab, and click settings.
  • Under System, click Advanced System Settings.
  • Under Advanced System Settings select the Misc.APDHandlingEnable parameter and click the Edit icon.
  • Change the value to 0.
  • Edit the MiscAPDTimeout value if desired.

Storage Troubleshooting Scenario #4 – vSAN Troubleshooting

Before you begin it is important to realize that vSAN is a software based storage product that is entirely dependent on the proper functioning underlying hardware components, like network, storage I/O controller and the individual storage devices. You always need to follow the vSAN Compatibility Guide for all deployments.

Many vSAN errors can be traced back to faulty VMkernel ports, mismatched MTU sizes, etc. It’s far more than simple TCP/IP.

Some of the tools you can use to troubleshoot vSAN are:

  • vSphere Web Client
    • The primary tool to troubleshoot vSAN.
    • Provides overviews of individual virtual machine performance.
    • Can inspect underlying disk devices and how they are being used by vSAN.
  • esxcli vsan
    • Get information and manage the vSAN cluster.
    • Clear vSAN network configuration.
    • Verify which VMkernel network adapters are used for vSAN communication.
    • List the vSAN storage configuration.
  • Ruby vSphere Console
    • Fully implemented since vSphere 5.5
    • Commands to apply licenses, check limits, check state, change auto claim mechanisms, etc.
  • vSAN Observer
    • This tool is included within the Ruby vSphere Console.
    • Can be used for performance troubleshooting and examined from many different metrics like CPU, Memory or disks.
  • Third Party Tools
Posted in Troubleshooting, vSphere | 1 Comment