Showing posts with label Troubleshooting. Show all posts
Showing posts with label Troubleshooting. Show all posts

Friday, June 14, 2013

Seven Tips for Troubleshooting VMware vSphere5

Here are seven tips for working with vSphere, including: Logging in via Command Line, dealing with connection problems using ssh to an ESXi host; network performance issues; possible storage problems; Log Files to View in vSphere ESXi 5; network performance troubleshooting; migrating to a virtual machine using VMotion.

By default, the ESXi host does not have ssh enabled, and the method to enable ssh can change based on whether or not the host is ESX or ESXi, and the version of the ESX/ESXi host. There are occasions when you need to login via command line using ssh to troubleshoot problems. In addition, there are alternative methods to get to a command line prompt such as DCUI and TSM, depending on the version and type of host. The ability to manage the host at a command line prompt will allow you to use many different Unix based commands as well as commands introduced by VMware called esxcli commands. The ability to make changes to network, storage and other critical parts of the host, depending on the state of the host, might only be possible at a command line prompt.

In vSphere 5, an administrator can manage the ESXi host from the command line using esxcli commands, such as esxcli network vswitch standard. The esxcli command set was first introduced in vSphere 4.0 and allows an administrator to manage many aspects of the ESXi host from the command line. The commands are available using the Direct Console User Interface (DCUI) to access the ESXi shell, using a remote application like putty to ssh into the host, or through the vSphere Command-Line Interface (vCLI).

The DCUI is similar to the BIOS of a computer and allows you to interact with the host through the console of the ESXi server to perform initial basic configuration and can also be used for troubleshooting using text-based menus. You can use the DCUI to enable local and remote access to the ESXi Shell.

A second method to access the command line is by utilizing an application such as putty to ssh into the ESXi host. In order for ssh to work you must enable the sshd service on the ESXi host.

A third method to run command line commands is thru vCLI. The vCLI provides a command-line interface for ESXi hosts. Multiple ESXi hosts can be managed from a central system with vCLI installed on it. The central system that VMware uses is a downloadable appliance called vMA. vMA enables administrators to run scripts that interact with ESXi hosts and VMware vCenter Server systems without having to authenticate each time. vMA is easy to download, install, and configure through the vSphere Client.

Direct Console User Interface (DCUI)

When the DCUI screen appears, press F2 Customize the System and login as root.Scroll to Troubleshooting Options and press Enter.Choose Enable ESXi shell and press Enter.Press Esc until you return to the main DCUI screen.

To enable ssh from the vsphere Client

Select the host and click the Configuration tab.Click Security Profile in the Software panel.In the Services area, click Properties.Select ssh and click Options.Change the ssh options. To change the Startup policy across reboots, click Start and stop with host and reboot the host.Click OK.

First method from the Direct Console User Interface (DCUI)

Hit Alt+F1, if TSM is enabled, log in with root credentials, elseOnce the DCUI screen appears, press F2 and login as root to enable the TSM.Navigate down the screen and choose Troubleshooting Options, and press enter.Troubleshooting Options provides additional options for TSM in ESXi 4.1.
Local Tech Support - Access command line via Alt+F1 on the console. Remote Tech Support (ssh) - ssh access on the console of the ESXi host. Modify Tech Support Timeout - Tech Support Mode will be disabled after a certain amount of time.

Second Method from the vSphere Client

From the vSphere Client, select the host and click Configuration tab.Then choose Security profile and Properties.Here you can enable Local Tech Support as well as Remote Tech Support (ssh). They are enabled
If the Daemon is running, and disabled if the Daemon is Stopped.
If you want to enable either mode, highlight the mode, then choose Options.Now you can modify the Startup Policy or change the Service to Start, then click OK.

View the original article here

Tuesday, June 11, 2013

Performance and Troubleshooting with esxtop

The utility that many senior VMware administrators rely on to address performance and troubleshooting issues is the built-in utility called ESXTOP. The tool has been built in to the hypervisor, and can be used in both ESXi and the old ESX host. The first tool many VMware Administrators rely upon is esxtop, to check real-time performance on an ESXi host using a command-line tool such as ssh to start the esxtop CLI utility. This paper will introduce and demonstrate how to start and use esxtop, specifically looking at the CPU fields. In addition, we will cover information that can be used to help with performance issues of the CPU. The esxtop utility is an excellent tool to use when you want to observe an individual ESXi host's performance.

This paper introduces and gives examples of how the esxtop utility can help address performance issues. First, we will discuss the history of esxtop and show several different methods that can be utilized to start the monitoring tool. Next, we will discuss how to use esxtop by using interactive commands that can be typed in while esxtop is running. Finally, we will look at how to use esxtop is given by looking at how to interpret CPU data utilizing the esxtop utility.

The esxtop command is a tool based upon the old UNIX command-line tool called top that continuously updates every five seconds, displaying a snapshot of the processes running on an ESXi host. The top program has been around since the mid-1980s and has been ported to many different versions of UNIX and Linux. Originally, VMware ported a version of the UNIX top program and customized it to gather statistics for the ESX host, the standard top program was included in the service console as well. When VMware changed the direction of its hypervisor and removed the service console, esxtop continued to be a useable command-line utility within the ESXi hypervisor, which runs a proprietary version of UNIX. VMware also modified esxtop to run remotely and called it resxtop. The remote resxtop runs within the vCLI, and allows the user to remotely connect to an ESXi host and run esxtop.

The resxtop command is used when you want to run esxtop remotely from the vSphere command-line interface (CLI) using vCLI, usually within the vMA. The resxtop utility is referred to as remote esxtop and offers a secure method to run scripts across multiple ESXi hosts and virtual machines. This paper concentrates on how to use esxtop, since once resxtop is started all of the counters and fields are the same.

The esxtop command can also be run in batch mode, which allows statistics to be collected and saved into a file, then played back at a later point in time. The data can be read using the Windows Perfmon utility or Microsoft Excel. To start running esxtop in batch mode use the following syntax.

# esxtop -a -b > outputfile.csv

-a show all of the statistics -b stands for batch mode > outputfile.csv redirect the output to the file and the file must end with .csv

To stop processing in batch mode do Ctrl+C.

By default, esxtop runs in interactive mode, which initially begins by typing in esxtop at the command line.

Depending on what system you are running on, you might have to set the terminfo database to xterm.

# export TERM=xterm

# esxtop

Once you launch esxtop you will see a default screen (Figure 1), I included callout descriptions to some of the main host statistics and fields. The esxtop output can show more information than you will need for the performance or troubleshooting problem that you are addressing. There are also interactive commands that can be issued to customize the display, which will be shown in Figure 3. Figure 1 is an example of the output generated from esxtop or resxtop. There are several screens that can be viewed. The default screen is always the CPU view as shown in the screen shot Figure 1, and the screen refreshes every five seconds by default. The esxtop displays statistics based on worlds. A world can be defined as schedulable entity, and other operating systems would call it a process. Each virtual machine will have multiple worlds running based on several factors. There will be one world for each of the vCPUs running on the VM. There will be a world for the VM's MKS, and a world for the virtual machine monitor (VMM) of the world.

Figure 1. Esxtop outlining main statistics and showing location of fields

The default view when esxtop is launched is going to show information for the CPU. You can change the screen view by simply typing in a corresponding letter for the view that you are interested in inspecting. Here is the list of views that you can switch to by simply typing in the letter associated with the view.

c: CPU view which is the default viewm: Memory viewn: Network viewd: Disk adapter viewu: Disk device viewv: Disk VM viewi: Interruptsp: Power management

For example, if you want to switch from looking at the CPU view information to looking at the memory view, simply type in the letter m to make the switch. Figure 2 shows the memory view. Figure 2. Default esxtop screen when first started

To learn more about other options you can choose, type in h to get the help view for esxtop.

Figure 3. Displays the help screen interactive commands

The performance counters are calculated in different ways. The counters or statistics types can be a Rate, Delta, or Absolute value. CPU Ready is a Delta, which is the change from the previous interval. As an example, some counters are calculated as the delta between two successive snapshots or intervals. The %Used is a good example of a Delta.

%Used = (Total CPU used time at the second snapshot - Total CPU used time at the first snapshot) / time elapsed between snapshots

To help understand the esxtop output it helps to define fields and counters that you are viewing.

World - Is a schedulable entityID - World IdentifierGID - World Group IdentifierNWLD - Number of Worlds for an entityCPU Load Average - is the mean of CPU loads in 1 minute, 5 minute, and 15 minutes, base on 6 second samples.

Figure 4. Displays the CPU screen with VMs running Figure 4 shows CPU activity for the ESXi host and there are two VMs running on the system named second and w2k3vm. In order to create contention on the CPU both VMs have a CPU affinity set for CPU 1 and are running a math application in a loop, which is generating 99% busy. If you look at the %USED for both VMs, they are both running at a little more than 49%, since they are competing equally for the same PCPU. Another interesting field that is used for performance monitoring when it comes to CPU issues is the %RDY field. The %RDY field is the percentage of time that the world was ready to run, but was waiting for its turn. In the example above, the two VMs, second and w2k3vm, have a %RDY time a little greater than 50%, which is extremely high. Normally, I become concerned if I see a steady value greater than 10%. If the %RDY is greater than 10%, I would look to see if %MLMTD is high as well. If %MLMTD is high, it would signify that a CPU Limit has been set on the VM and needs to be investigated. In addition, there is a field called %WAIT that shows wait and idling time together.

PCPU USED% - CPU utilization per physical CPU (includes logical CPUs)

%USED - CPU Utilitzation. The percentage physical CPU time accounted to the world.

The formula is: %USED = %RUN + %SYS - %OVERLP

It is possible that the %USED of a world can be greater than 100%, if the system service runs on a different PCPU for this world.

If the %USED of a VM is high, that means the VM is using lots of CPU resources, which can be normal.

%RDY - The percentage of time the world was ready to run, but was not provided the CPU resources. A world in a run queue is waiting for the CPU scheduler to let it run on a PCPU. If %RDY of a VM is high, it means the VM is possibly under resource contention. Check %MLMTD as well. If %MLMTD is high, you may raise the CPU Limit setting for the VM. If %RDY - %MLMTD is high, the VM is under CPU contention.

%MLMTD - The percentage of time the world was ready to run but deliberately was not scheduled because that would violate the CPU Limit setting. What does It mean if %MTMLD of a VM is high, the VM cannot run because of the CPU limit setting.

%SYS - The percentage of time spent on the ESXi VMKernel running process interrupts and other system services on behalf of the world.

%IDLE - The percentage of time the vCPU world is in an idle loop.

%CSTP - The percentage of time the vCPUs of a VM are spent in the co-stopped state, waiting to be co-started. %SWPWT - The percentage of time the world is waiting for the ESXi's VMKernel to swap memory. If %SWPWT is high, then the VM is swapping memory.

%RUN - The percentage of total scheduled time for the world to run. If %RUN of a VM is high, the VM is using lots of CPU resources, but does not necessarily mean the VM is under resource constraint.

%WAIT - The percentage of time the world spent in the wait or idle state. This %WAIT is the total wait time, the world is waiting for some VMKernel resource. The %WAIT time can be high because there are many worlds waiting for events to happen, and the total wait time can be high dude to the large number of worlds waiting on events.

The esxtop utility provides detailed performance data for an ESXi host. This real-time data gives the system administrator information that aids in detecting performance issues. To better interpret esxtop data, it helps to understand how to setup the esxtop view with the appropriate fields. When dealing with CPU performance problems for a VM, one of the first fields to observe is %RDY. If this field is larger than 10%, it could mean that you have more requests for CPU processing than resources available. Thus, %RDY time is the best indicator of possible CPU performance issues.

To learn more about how you can improve productivity, enhance efficiency, and sharpen your competitive edge, Global Knowledge suggests the following courses:

VMware vSphere: Fast Track [V5.1]

VMware vSphere: Optimize and Scale [V5.1]

Visit www.globalknowledge.com or call 1-800-COURSES (1-800-268-7737) to speak with a Global Knowledge training advisor.

Steve Baca has been working in the Information Technology field for more than 15 years, after graduating from the University of Nebraska with a Bachelors degree in Computer Science and Mathematics. After spending time programming and doing Systems Administration, Steve has been doing technical training for VMware, Netapp, Sun Microsystems, and Symantec.


View the original article here

Monday, June 10, 2013

Windows 7 Troubleshooting Tips

Your company has finally migrated to Windows 7. Congratulations! And now you have your first support call. This Microsoft white paper will tell you all you need to know about the new troubleshooting tools that are bundled with Windows 7 and provide you with the knowledge to quickly figure out what's happening "under the hood" on a Windows 7 computer. The selected tools described in this Microsoft white paper are a subset of the tools available on Windows 7; the focus is on timely troubleshooting of the operating system and software applications.

The tools selected for this white paper are a subset of the tools available on Windows 7, but the focus is on timely troubleshooting of the operating system and software applications. The tools are in two sections: the first part deals with system troubleshooting tools and the latter part with application troubleshooting tools. Boot-up a Windows 7 computer and test out each tool to become an expert in Windows troubleshooting. Tips are listed in bold typeface throughout the white paper, with the following explanation in italics.

Tip: The first place to start looking for answers on a Windows 7 computer is by opening the Action Center. The Action Center is the central portal for all everything good and bad that happens on a Windows 7 computer system.

Looking on the taskbar, to the left of the clock, you'll see a white flag, (possibly marked with a red X indicating that there are some issues to review). Selecting the white flag and clicking the Open Action Center link displays the two major sections, Security, and Maintenance. Expanding the Security section displays the current health policy of Windows 7. By expanding the Maintenance section, we can take a look at the reliability history of the computer by clicking the Reliability Monitor link as shown on the next page in Figure 1.

The Reliability Monitor displays what has happened on your computer for a full calendar year since installation, in a day or week grid display. The displayed information is gathered and updated from event logs and event trace data due to a scheduled task executed by the Task Scheduler every hour. Tip: Take a look at what tasks are executing right now on your Windows 7 computer; open the Task Scheduler and review the Task Status display. The information displayed includes Application failures and Windows failures, in addition to Warnings and pertinent Information showing when drivers were last updated. The chart cannot be deleted by an end user.

After selecting a component from the chart, a summary of the reliability details is displayed; further details can be reviewed by clicking the link View all problem reports. Tip: From this location, we can drill down and view technical details from each report, finding out, for, example, what executable or DLL file is failing as shown in Figure 2.

Although device manager has been around since Windows 95, it's worth checking out the state of the installed hardware. Tip: Because the computer hardware is very highly integrated, after opening Device Manager from Control Panel, make sure to select the View menu and turn on Show Hidden Devices. This will show you a wealth of integrated software and hardware components that are normally not shown. Expanding the Non-PNP node also shows you motherboard devices that are not plug and play; potentially a system component could be faulty, and not be visible by default.

If you find that there is a red or yellow icon indicating a problem with an installed driver, open an elevated command prompt and type sigverif, to produce a report that will indicate what drivers are digitally signed. Tip: After reviewing the report, if there are unsigned drivers, take a moment to search the manufactures web site to see if an updated driver solves your driver problem.

Windows 7 also has an additional tool called Driver Verifier. Its job is to monitor kernel-mode drivers detecting incorrect function calls or other actions that might corrupt your Windows 7 system. Run Driver Verifier from an elevated command-prompt by typing Verifier. Reports can be generated providing details on the current state of the installed drivers, and also allow you to test IRQ and I/O settings as shown in Figure 3. Using this tool allows you to provide additional details to manufactures, or yourself, when drivers are the issue. More details can be found here: http://www.microsoft.com/whdc/devtools/tools/win7driverver.mspx

The System Configuration Utility (msconfig.exe) has been a part of Windows for several versions, and can be quite helpful when you want to diagnose or change your Windows 7 computer system's boot process.

The General Tab - This is where the boot process from normal mode into diagnostic, in effect forcing a Windows 7 system into a safe mode boot cycle. We also have the option of selected setup; as shown in Figure 3, this option allows you to specify whether you would like to load system services, load startup items from the registry, or modify the original boot configuration.


View the original article here

Saturday, June 8, 2013

Troubleshooting Cisco Switches

When an outage or network incident takes place, it can often create an intense burden for the engineers called upon to assist. As is the case with many emergencies, the demand for action and results may supersede more effective ways of resolving the issue(s) at hand. Having a solid strategy for how to approach incidents can give you the calm confidence to effectively resolve the problem even in the midst of the proverbial storm.

In ancient societies, many rituals existed, some targeted especially at recognizing the passage of an individual from youth to a recognized member of the adult community. These ceremonies or events are often referred to as a "rite of passage" and indicate an important milestone in the life of that person as well as the group of which they are a part. In most western cultures, this is no longer directly relevant, but certain experiences certainly play a similar role. For almost every network engineer, the "rite of passage" is a network outage or problem that was unpleasantly memorable and particularly difficult, and told in stories for years to come. The important concept here is to realize that while these things can and do happen, they should remain infrequent events rather than frequent occurrences. This is essentially what network troubleshooting and problem resolution is all about. In this white paper we will examine five steps for addressing issues, and provide some tools for dealing with issues when they arise.

Issues and problems with networks of any shape and size are fundamentally inevitable, almost entirely due to the nature of the human condition, namely, imperfection. On the one hand, the fact that imperfect people created the networking technology used in the world today guarantees that flaws and imperfections will exist in that technology. Algorithms will malfunction, hardware will fail, and software will have bugs in it that can create issues of various kinds. On the other hand, network engineers troubleshooting issues almost always have to deal with a crowd of end-users, managers, and company leadership, both at times when the network is in steady-state and when it is having problems.

This underscores topics discussed in other white papers and sources of information, namely that problems can and will arise. The key, then, is to prevent as many issues as possible before they even arise, through activities such as proactive maintenance, device monitoring, and so forth. The best solution to a problem is to keep it from occurring in the first place. This helps minimize true problems that are not possible to foresee and builds confidence on the part of the end-users that matters are well in hand.

One of the reasons for pointing out the nature of the human condition at the outset is strategic, since some issues may in fact not even be issues at all. An actual issue in this regard involved Internet access at a large port authority on the west coast, when the customer requested the Internet Service Provider to investigate a problem. Upon examination of the usage reports generated internally by the provider, the customer's support engineer noted that the spikes consuming all of the traffic were taking place at approximately 2:00 AM PST, when the offices were closed. When the engineer arrived at the customer site to report the findings, the customer reluctantly admitted that a janitor (who was dismissed shortly after) had been illegally downloading movies of questionable content. The port authority began with the assumption that the service provider was having a service issue, but investigation revealed the real source of the problem.

The investigation and diagnosis phase is the most critical part of the process, as it sets the stage for rallying resources and helps to narrow the scope of the actual issue. Without sounding disingenuous, understand that endusers will probably not understand networking technology at even a fundamental level, and that the complaint may not even have a technical foundation. For example, the user may report that they are experiencing network slowness and may even be impatient, but when you question further, you may discover that they are downloading large files or streaming videos from the Internet. In reality, that individual may have felt that the network was the issue when it was a problem of their own making. The skill required when interacting with end-users is to ask the right questions to get the information without creating offense.

In the healthcare field, patients visit a doctor with a set of symptoms that they need addressed and treated. In some cases, such as the common cold, treating the symptoms themselves is advised, mostly because nothing else can be done. In other situations, the physician may order a variety of tests in order to find out the true root of the problem. Once the actual root cause is discovered, then the healthcare practitioner can set about a treatment plan to address and resolved the problem.


View the original article here