Monday, June 17, 2013

OM12 SP1 UNIX/Linux Agent Troubleshooting Table

Even though there are already many good postings and articles out there, all about this topic (will list them at the end of this posting) I still want to add my experiences as well.

For a customer quite a few Linux servers had to be monitored. During the roll-out of these OM12 SP1 Agents to the Linux systems several errors popped up. Thanks to a highly experienced Linux guru working for this customers these issues were sorted out pretty fast. Based on this experience I have made a table with the most occurring errors and their possible causes and their fixes.

Issue Cause & Resolution
DNS Configuration error 01: Faulty reverse DNS Lookup Zone. When fixed all went just fine
02: Linux system had multiple names, all registered in DNS. After a couple of retries the Agent landed properly.
03: System resided in an old segment which didn’t have a zone on the new DNS servers. When fixed all went just fine
Failed during SSH Discovery 01: SSH was locked down to ROOT only. When fixed for the OM12 SP1 account used by Linux all went just fine.
02: An outdated version of SSH which isn’t compatible with the .NET SSH implementation Microsoft uses on the OM12 SP1 side. SSH requires an update.
03: An outdated version of SSH which doesn’t accept certain SSH calls. SSH requires an update.
Failed to install kit 01: Home folder of the OM12 SP1 Linux account was missing. After having added this folder all went just fine.
02: Certain files were locked. When retried the installation of the OM12 SP1 Agent some hours later all went just fine.
Installation hangs On some systems the installation of the OM12 SP1 Linux Agent just hanged. Had to hard stop the OM12 SP1 Console. Then a second attempt went just fine.
Unexpected Discovery Result 01: Reason unknown. Second attempt (some hours later) ran just fine.
02: A restart of the OM12 SP1 services on the OM12 SP1 MS running the Discovery (be careful though): http://www.opsman.co.za/?p=50
WinRM cannot complete the operation Firewall was blocking WinRM service. After having opened that port (TCP 1270) it still didn’t work. See this posting to get it working: http://blogs.technet.com/b/chandanbharti/archive/2011/12/21/linux-agent-install-issue.aspx
Agent verification failed Multiple DNS issues:
1: Linux system has a different hostname compared to the FQDN. Correct it (hostname or FQDN) and all is just fine.
2: DNS record isn’t present. Add the record and all is just fine.

Other resources for troubleshooting OM12 SP1 UNIX/Linux Agent installation issues:

  1. Bob Cornelissen: http://www.bictt.com/blogs/bictt.php/2011/05/29/scom-trick-15-cross-platform
  2. Microsoft TechNet Wiki: http://social.technet.microsoft.com/wiki/contents/articles/4966.troubleshooting-unixlinux-agent-discovery-in-system-center-2012-operations-manager.aspx
  3. Stefan Roth: http://blog.scomfaq.ch/2012/09/11/scom-2012-linux-discovery-unspecified-failure/
  4. Enabling logging and debugging in OM12: http://technet.microsoft.com/en-us/library/hh212862 
  5. Microsoft TechNet – Trouble shooting UNIX/Linux monitoring: http://technet.microsoft.com/en-us/library/hh212885

Other useful resources, all related to UNIX/Linux monitoring with OM12:

Tasks
Install Agent on UNIX and Linux Using the Discovery Wizard
Concepts
Using Templates for Additional Monitoring of UNIX and Linux
Troubleshooting UNIX and Linux Monitoring
Accessing UNIX and Linux Computers in Operations Manager
Required Capabilities for UNIX and Linux Accounts
Management Pack Issues
Operating System Issues
Certificate Issues
Managing Certificates for UNIX and Linux Computers
Managing Resource Pools for UNIX and Linux Computers

Friday, June 14, 2013

OM12 SP1 Consoles: Mind The Differences When Monitoring Network Devices…

Even though the OM12 SP1 Web Console has added value for many organizations and many types of operators, there are situations where the OM12 SP1 Web Console simply won’t fit the bill.

For instance, when monitoring network devices one might bump in to Alerts related to ports of those same network devices. And now something happens. The Alert shown in the OM12 SP1 Web Console doesn’t show the network device experiencing issues with a certain port. Instead the MAC address is shown, which is the value of the Path: property of the related Alert:
image

In an environment where you’re monitoring 350+ network devices this isn’t good. Simply because for the bulk of the monitored network devices this Alert isn’t really critical. Not too many users will be affected, if any at all. But what when the port having this issues is from a key network device? Now we’re having a total different kind of situation.

Therefore the FQDN should be shown instead. As it turns out, the OM12 SP1 Console does that. It doesn’t show the Path: property of the Alert but the Full Path Name: property instead, which is the FQDN of the related network device:
image

Conclusion:
For network operators using OM12 SP1 it’s better to use the OM12 SP1 Console instead of the OM12 SP1 Web Console instead since crucial information – the FQDN’s of the related network devices – isn’t shown for certain Port related Alerts.

Wednesday, June 12, 2013

New Community Effort: Whiteboard Wednesday

From today a new community effort is launched, titled Whiteboard Wednesday, by fellow MVP Maarten Goet.

As Maarten states: ‘…The whole goal is to provide visitors every Wednesday with 5-minute videos where a community experts draws out how a certain technology or feature on the Cloud OS (Hyper-V, System Center, Azure) works, with some narration, as if it were on a whiteboard…’

Other good news: This new community effort is going to be an open community.

The first video is already out there, all about Hyper-V Recovery Manager.

Want to know more? Go here.

New KB Article: Removing An OM12 Management Servers Causes Issues

Microsoft released KB2853431, all about the issue where removing an OM12 Management Server from the All Management Servers Resource Pool (AMSRP) causes the same OM12 Management Server to become grayed out.

KB2853431 describes this issue in more detail, like the cause and how to solve it.

Russ Slaten has also written a posting about the same issue some weeks ago, to be found here. Personally I think his posting served as a basis for the earlier mentioned KB article Smile.

Bug Alert + Workaround: SharePoint Server 2013 MP & Web Console HTTP 500 Error

Bumped into this issue what I think is a bug: When the SharePoint Server 2013 MP, version 15.0.4420.1017, is imported and properly configured, the SCOM 2012 Web Console throws an HTTP 500 error when opening the View Microsoft SharePoint > Services:
image

I have reproduced this issue in all my OM12 test environments and see this happening at a customers location as well.

Workarounds
There are two workarounds to ‘deal’ with this bug. The first is running the full blown OM12 Console. But sometimes that isn’t possible for some operators, based on various valid reasons. So the Web Console is their only interface to interact with OM12. This is where the second workaround comes in.

Simply follow these six steps:

  1. Log on to the OM12 Console with admin permissions. Go to Administration and create a new MP with the name of the department requiring the Microsoft SharePoint Views in OM12;
  2. Give this name an underscore so it comes on top of the tree shown in the Monitoring pane;
  3. Go to Monitoring and right click on the name of the new MP you created in Step 1 > New > State View and copy these settings:
    image 
  4. Go to the second tab, Display and copy these settings:
    image
    > OK;
  5. The new View will be saved now in the MP you created in Step 1.Test the View in the Web Console. It will work as intended:
    image
  6. Don’t forget to assign this View to the proper User Roles as defined in your OM12 environment.

As stated in previous postings the SharePoint Server 2013 MP has some serious issues. Hopefully an update for this MP will be released containing fixes. Until then this is what we have to work with.

Tuesday, June 11, 2013

High Level Overview: NetApp SAN Monitoring With DATA ONTAP MP

Update 06-12-2013:
Cameron Fuller posted today a blog article all about how to tune this MP. Awesome. So for more information about how to tune this MP when all is in place, go here. Thanks Cameron!

This posting contains a high level overview of the required steps in order to monitor a NetApp SAN with the DATA ONTAP MP, titled OnCommand PlugIn by NetApp. This high level overview is based on version 3.2 of the OnCommand Plugin.

For a full installation manual please use the PDF files supplied by NetApp. These manuals are part of the downloadable executable (OnCommand-PlugIn-Microsoft_3.2_x64_NetApp.exe).

Dependencies
This MP has some dependencies. Without having them in place AND properly configured, the OnCommand Plugin won’t work. So make sure all is accounted for.

  1. PowerShell version 3.0 has to be installed on ALL OM12 Management Servers;
  2. NetApp OnCommand PlugIn has to be installed on ALL OM12 Management Servers;
  3. SNMP on ALL NetApp Filers must be enabled and configured;
  4. ALL NetApp Filers must be present in OM12 as network devices (so run a Discovery);
  5. The OM12 Action account requires permissions on the NetApp Filers;
  6. A SQL server for hosting the SQL database the OnCommand Plugin uses. The SQL Server hosting the OpsMgr SQL database will do the trick.

Dependencies 1, 3, 4 and 5 must be in place before you start with installing the OnCommand PlugIn. Dependency 2 will be taken care of when installing and configuring the OnCommand PlugIn.

Installation & configuration
The installation of the OnCommand PlugIn starts really simple with installing the OnCommand Plugin on ALL OM12 Management Servers. Please make sure PS 3.0 is installed and operational before you start. Otherwise the installation will fail.

  1. Installing OnCommand Plugin on all OM12 Management Servers
    1. Start the file OnCommand-PlugIn-Microsoft_3.2_x64_NetApp.exe with elevated permissions.
    2. Follow the wizard and select the required components, e.g: SCOM Management Packs, Storage Monitoring, SCOM Console Integration, Cmdlets, Documentation and OnCommand Discovery Agent;
    3. When having SCORCH and/or Hyper-V you can also select the components related to those technologies;
    4. The account you have to specify requires local admin access on the OM12 Management Servers. Many times using the OM12 Action account works best;
    5. From version 3.2 this MP uses a SQL database as well. Using the same SQL server which hosts the OpsMgr database works fine for me.

  2. Configuring the NetApp MP
    Make sure all NetApp Filers are already discovered and monitored in OM12 as network devices.
    1. During the installation of the OnCommand PlugIn on the OM12 Management Servers two NetApp MPs are imported: OnCommand Data ONTAP and OnCommand Data ONTAP Reports;
    2. When you had the OM12 Console open when installing OnCommand PlugIn, close it and open it again;
    3. Create a MP for the overrides created for the NetApp MPs;
    4. Go to Monitoring > Data ONTAP > Storage Systems > Management Server. Select one of the listed OM12 Management Servers and click on the right side of the OM12 Console under the header Health Service Tasks on Data ONTAP: Add Controller;
    5. Add all NetApp controllers, one by one;
    6. Go to Monitoring > Data ONTAP > Storage Systems > Controllers. Select one of the listed Controllers and select in the right side of the OM12 Console under the header Data ONTAP Controller Tasks > Data ONTAP Manage Controller Credentials;
    7. Add per Controller the required credentials. Best Practice here is to use an AD based account. When SSL isn’t required, remove the selection. Because of a bug removing the SSL requirement might fire an error. Simply click Continue and go on.

  3. Discovering the NetApp components
    Now all NetApp components need to be discovered. Otherwise no monitoring Smile.
    1. Search for the Rule Data ONTAP: Discovery Rule. Use this shortcut for this search: go to Tools (top menu bar of the OM12 Console) > Search > Rules. Saves you a lot of time;
    2. By default this Rule is turned off. Enable it through an override and store it in the MP created in Step 2.3;
    3. Now the Discovery has to be started. Go to Monitoring > Data ONTAP > Storage Systems > Management Server. Select one of the listed Controllers and select in the right side of the OM12 Console under the header Data ONTAP Controller Tasks > Data ONTAP: Run Discovery Task;
    4. When the OM12 Action account is authorized for accessing the NetApp SAN, you don’t need to enter credentials for running this task;
    5. After an hour or so all NetApp components are discovered and will have a monitored state a bit later.

  4. Required: Tuning!
    This MP is really good and really appreciated by many of my customers. However, many of the Monitors in this MP are set to zero so those Monitors require some good tuning in order to get the best out of this MP.

    Other Monitors use wrong thresholds. This isn’t a bug is done on purpose, forcing you to tune them according your environment. When done this MP will really deliver added value.

Compliments!
Compliments to NetApp for delivering such a good MP. When properly tuned (like any other MP Smile), this MP really delivers added value for any organization running one or more NetApp SANs and OM12. I have seen many third party MPs but not many are of this level. A job well done NetApp!

Thursday, June 6, 2013

OM12 Web Console: Where Are My Alerts?!

Bumped into this issue.

What?
The OM12 Web Console didn’t show some particular Critical Alerts. And no matter what I did nor tried the issue remained the same. Somehow the OM12 Web Console simply ‘refused’ to show some Critical Alerts which were neatly shown in the OM12 Console…

Why?
So I contacted my MVP buddies and soon I got a solid question: ‘Are those Alerts older then 7 days?’. BINGO! By default the OM12 Web Console shows only Alerts which are less then 7 days old. And the Alerts which weren’t shown were older then 7 days!

How it was solved
After a quick reconfiguration of the related web.config file (open it from an elevated cmd-prompt!) used by the OM12 Web Console (located in ~:\Program Files\System Center 2012\Operations Manager\WebConsole\MonitoringView) based on this posting.

In this case I added this entry: <add key="AlertsDaysBefore" value="24" />, as shown here:
image

I saved the file and ran an IIS reset from an elevated cmd-prompt.

Afterwards everything was just fine and the Alerts showed in the OM12 Web Console!

Recap
Even though the OM12 Web Console doesn’t deliver the same functionality as the full blown OM12 Console, it still servers many purposes. Also good to know is that the OM12 Web Console can be modified in its behavior. This posting, eventhough aimed at SCOM 2007 R2, works on many occasions with the OM12 Web Console as well.

NEVER EVER FORGET TO MAKE A COPY OF THE WEB.CONFIG FILE BEFORE MAKING ANY MODIFICATIONS. SO THERE IS ALWAYS A WAY BACK!

A BIG word of thanks to my MVP buddies for helping me out here. Thanks guys!