9 Simple Steps to do CUCM Health Checkup
CUCM is the heart of Cisco Collaboration. The server which integrates with all servers for signaling and is accountable for registrations of almost all devices should be in good health. A good plan to check health is very important. How to check? What to check specifically? What is alarming? Need not worry, the information below will provide you simple 9 steps to do a basic CUCM health checkup. You will need to log in to CUCM CLI using OS ADMIN credentials.
Step 1. When you logged in make sure the partitions are in an aligned state.
Command Line Interface is starting up, please wait ...
Welcome to the Platform Command Line Interface
VMware Installation:
2 vCPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Disk 1: 80GB, Partitions aligned
6144 Mbytes RAM
Step 2. Use the command “show status”
To analyze any problem, this is the first command you should probably begin with.
This will give important information as the Hostname of your server, server full version, the UP time of the server, etc.
Unified OS version – This is the Guest OS version used. Cisco CUCM before 12.x versions are using RHEL6. CUCM 12.X versions are moved to CentOS versions.
When you should be alarmed? When the Average processor Load is above 60-70%, IOWAIT above 4-5%, and disk usage above 90% for active/inactive/logging partition may indicate potential problems with the server.
admin:show status
Host Name : PUBLISHER
Date : Fri Apr 2, 2021 11:05:50
Time Zone : India Standard Time (Asia/Kolkata)
Locale : en_US.UTF-8
Product Ver : 11.5.1.16900-16
Unified OS Version : 6.0.0.0-2
Uptime:
11:05:52 up 154 days, 23:54, 1 user, load average: 0.43, 0.47, 0.45
CPU Idle: 91.80% System: 06.60% User: 06.09%
IOWAIT: 00.00% IRQ: 00.00% Soft: 00.51%
Memory Total: 5993936K
Free: 280396K
Used: 5713540K
Cached: 1813344K
Shared: 303692K
Buffers: 200508K
Total Free Used
Disk/active 14154228K 575156K 13433944K (96%)
Disk/inactive 14154228K 13393148K 35424K (1%)
Disk/logging 49573612K 19563780K 27484904K (59%)
Step 3. The command “show network cluster” will give an insight of the number of nodes in the cluster. The operations team should look at the “authenticated using TCP” section. All servers in the cluster must be in “authenticated” state.
admin:show network cluster
172.16.154.142 SUBSCRIBER01.domain.com SUBSCRIBER01 Subscriber callmanager DBSub authenticated using TCP since Sun Nov 29 17:16:37 2020
172.16.154.143 SUBSCRIBER02.domain.com SUBSCRIBER02 Subscriber callmanager DBSub authenticated using TCP since Fri Mar 19 10:28:13 2021
172.16.154.149 PRESENCE02.domain.com PRESENCE02 Subscriber cups DBSub authenticated using TCP since Sun Nov 29 17:16:26 2020
172.16.154.148 PRESENCE01.domain.com PRESENCE01 Subscriber cups DBPub authenticated using TCP since Thu Oct 29 11:13:29 2020
172.16.154.141 PUBLISHER.domain.com PUBLISHER Publisher callmanager DBPub authenticated
Server Table (processnode) Entries
----------------------------------
PUBLISHER.domain.com
SUBSCRIBER01.domain.com
SUBSCRIBER02.domain.com
172.16.154.148
172.16.154.149
Step 4. Next useful command to check server health is “utils diagnose test” which will do a self-assessment of the server covering all important aspects.
The output can provide you important information of errors (if any) such as Disk space issues, Server Manager issues, Tomcat Memory leaking issues / HTTP-HTTPS related issues, network connectivity issues, NTP issues, etc. These are the most common issues.
If you face any issues related to Tomcat, Server manager, you need to contact Cisco TAC.
admin:utils diagnose test
Log file: platform/log/diag4.log
Starting diagnostic test(s)
===========================
test - disk_space : Passed (available: 562 MB, used: 13120 MB)
skip - disk_files : This module must be run directly and off hours
test - service_manager : Passed
test - tomcat : Passed
test - tomcat_deadlocks : Passed
test - tomcat_keystore : Passed
test - tomcat_connectors : Passed
test - tomcat_threads : Passed
test - tomcat_memory : Passed
test - tomcat_sessions : Passed
skip - tomcat_heapdump : This module must be run directly and off hours
test - validate_network : Passed
test - raid : Passed
test - system_info : Passed (Collected system information in diagnostic log)
test - ntp_reachability : Passed
test - ntp_clock_drift : Passed
test - ntp_stratum : Passed
skip - sdl_fragmentation : This module must be run directly and off hours
skip - sdi_fragmentation : This module must be run directly and off hours
Diagnostics Completed
The final output will be in Log file: platform/log/diag4.log
Please use 'file view activelog platform/log/diag4.log' command to see the output
Step 5. “Utils ntp status”, As the command itself is self-explanatory, NTP synchronization is mandatory for all devices in the network but most important in Collaboration infrastructure.
NTP issues in collaboration can cause very complicated issues such as DB replication issues. Informix DB replication will not be stable without NTP synchronization.
Use the command in all CUCM, IM Presence Nodes. All nodes output should come as follows.
Make sure to check NTP status.
(i) NTP should be synchronized.
(ii) NTP stratum should be <=3 (for Publisher node, incase Subscriber node then NTP<=4)
admin:utils ntp status
ntpd (pid 24927) is running...
remote refid st t when poll reach delay offset jitter
==============================================================================
*172.16.151.1 LOCAL(0) 2 u 831 1024 377 0.989 1.242 1.200
synchronised to NTP server (172.16.151.1) at stratum 3
time correct to within 40 ms
polling server every 1024 s
Current time in UTC is : Fri Apr 2 05:37:11 UTC 2021
Current time in Asia/Kolkata is : Fri Apr 2 11:07:11 IST 2021
Step 6. “Utils service list” command is used to check if required services of the servers are in started state. Make sure to check this on all nodes of the cluster.
admin:utils service list
Requesting service status, please wait...
System SSH [STARTED]
Cluster Manager [STARTED]
Name Service Cache [STARTED]
Entropy Monitoring Daemon [STARTED]
Cisco SCSI Watchdog [STARTED]
Service Manager [STARTED]
HTTPS Configuration Download [STARTED]
Service Manager is running
Getting list of all services
>> Return code = 0
A Cisco DB[STARTED]
A Cisco DB Replicator[STARTED]
Cisco AMC Service[STARTED]
Cisco AXL Web Service[STARTED]
Cisco Audit Event Service[STARTED]
Cisco Bulk Provisioning Service[STARTED]
Cisco CAR DB[STARTED]
Cisco CAR Scheduler[STARTED]
Cisco CAR Web Service[STARTED]
Cisco CDP[STARTED]
Cisco CDP Agent[STARTED]
Cisco CDR Agent[STARTED]
Cisco CDR Repository Manager[STARTED]
Cisco CTIManager[STARTED]
Cisco CTL Provider[STARTED]
Cisco CallManager[STARTED]
Cisco CallManager Admin[STARTED]
Cisco CallManager SNMP Service[STARTED]
Cisco CallManager Serviceability[STARTED]
Cisco CallManager Serviceability RTMT[STARTED]
Cisco Certificate Authority Proxy Function[STARTED]
Cisco Certificate Change Notification[STARTED]
Cisco Certificate Expiry Monitor[STARTED]
Cisco Change Credential Application[STARTED]
Cisco DHCP Monitor Service[STARTED]
Cisco DRF Local[STARTED]
Cisco DRF Master[STARTED]
Cisco Database Layer Monitor[STARTED]
Cisco Dialed Number Analyzer[STARTED]
Cisco Dialed Number Analyzer Server[STARTED]
Cisco DirSync[STARTED]
Cisco Directory Number Alias Lookup[STARTED]
Cisco Directory Number Alias Sync[STARTED]
Cisco E911[STARTED]
Cisco ELM Client Service[STARTED]
Cisco Extended Functions[STARTED]
Cisco Extension Mobility[STARTED]
Cisco Extension Mobility Application[STARTED]
Cisco IP Manager Assistant[STARTED]
Cisco IP Voice Media Streaming App[STARTED]
Cisco Intercluster Lookup Service[STARTED]
Cisco License Manager[STARTED]
Cisco Location Bandwidth Manager[STARTED]
Cisco Log Partition Monitoring Tool[STARTED]
Cisco Management Agent Service[STARTED]
Cisco Prime LM Admin[STARTED]
Cisco Prime LM DB[STARTED]
Cisco Prime LM Server[STARTED]
Cisco Push Notification Service[STARTED]
Cisco RIS Data Collector[STARTED]
Cisco RTMT Reporter Servlet[STARTED]
Cisco SOAP - CDRonDemand Service[STARTED]
Cisco SOAP - CallRecord Service[STARTED]
Cisco Serviceability Reporter[STARTED]
Cisco Syslog Agent[STARTED]
Cisco TAPS Service[STARTED]
Cisco Tftp[STARTED]
Cisco Tomcat[STARTED]
Cisco Tomcat Stats Servlet[STARTED]
Cisco Trace Collection Service[STARTED]
Cisco Trace Collection Servlet[STARTED]
Cisco Trust Verification Service[STARTED]
Cisco UXL Web Service[STARTED]
Cisco Unified Mobile Voice Access Service[STARTED]
Cisco User Data Services[STARTED]
Cisco WebDialer Web Service[STARTED]
Cisco Wireless Controller Synchronization Service[STARTED]
Host Resources Agent[STARTED]
MIB2 Agent[STARTED]
Platform Administrative Web Service[STARTED]
SNMP Master Agent[STARTED]
SOAP - Diagnostic Portal Database Service[STARTED]
SOAP -Log Collection APIs[STARTED]
SOAP -Performance Monitoring APIs[STARTED]
SOAP -Real-Time Service APIs[STARTED]
Self Provisioning IVR[STARTED]
System Application Agent[STARTED]
Cisco Prime LM Resource API[STOPPED] Service Not Activated
Cisco Prime LM Resource Legacy API[STOPPED] Service Not Activated
Primary Node =true
Step 7. The next command “utils dbreplication status” will do a refresh of dbreplication which will verify all tables in the database.
admin:utils dbreplication status
Replication status check is now running in background.
Use command 'utils dbreplication runtimestate' to check its progress
The final output will be in file cm/trace/dbl/sdi/ReplicationStatus.2021_04_02_11_08_03.out
Please use "file view activelog cm/trace/dbl/sdi/ReplicationStatus.2021_04_02_11_08_03.out " command to see the output
After few minutes, use the command “utils dbreplication runtimestate” to check the replication status.
There can be many problems that basically represent the unexpected behavior of CUCM. Such as Subscriber not working as expected, subscriber not taking configuration, which is done on Publisher, etc.
Or the issues which are difficult to reproduce, originate from Informix database replication malfunction. To ensure everything related to this is fine, Check the below parameters.
(i) Replication status command Ended with all tables <no> out of <no>
(ii) No Errors or mismatches found
(iii) Ping response should be >= 80 ms.
(iv) DB/RPC/DbMon should have all as Y/Y/Y
(v) Replication queue is 0
(vi) Replication setup is in (2) state with Setup Completed.
If you see any differences in outputs, you should consult Cisco TAC.
admin:utils dbreplication runtimestate
Server Time: Fri Apr 2 11:12:16 IST 2021
Cluster Replication State: Replication status command started at: 2021-04-02-11-08
Replication status command ENDED. Checked 706 tables out of 706
Last Completed Table: devicenumplanmapremdestmap
No Errors or Mismatches found.
Use 'file view activelog cm/trace/dbl/sdi/ReplicationStatus.2021_04_02_11_08_03.out' to see the details
DB Version: ccm11_5_1_16900_16
Repltimeout set to: 300s
PROCESS option set to: 1
Cluster Detailed View from PUBLISHER (3 Servers):
PING DB/RPC/ REPL. Replication REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) DbMon? QUEUE Group ID (RTMT) & Details
----------- ---------- ------ ------- ----- ----------- ------------------
PUBLISHER 172.16.154.141 0.023 Y/Y/Y 0 (g_2) (2) Setup Completed
SUBSCRIBER01 172.16.154.142 0.202 Y/Y/Y 0 (g_3) (2) Setup Completed
SUBSCRIBER02 172.16.154.143 0.204 Y/Y/Y 0 (g_4) (2) Setup Completed
Step 8. The next command “file view install system-history.log” will display the events that have occurred on a node: restarts, installation of components (COP files), failed, and successful backups.
You will see two BOOT sequences if your system was shut down ungracefully.
This is an example of an unclean shutdown:
08/14/2012 13:36:09 | root: Boot 9.0.1.10000-37 Start
08/14/2012 17:28:25 | root: Boot 9.0.1.10000-37 Start
The ideal output should look like mentioned below.
admin:file view install system-history.log
=======================================
Product Name - Cisco Unified Communications Manager
Product Version - 11.5.1.16900-16
Kernel Image - 2.6.32-573.18.1.el6.x86_64
=======================================
08/01/2019 17:00:56 | root: Boot 11.5.1.16900-16 Start
08/01/2019 20:39:39 | root: Cluster Security Mode Cluster set to secure mode using tokenless CTL (CLI)
08/13/2019 17:31:36 | root: Boot 11.5.1.16900-16 Start
08/21/2019 12:06:42 | root: Restart 11.5.1.16900-16 Start
08/21/2019 12:07:15 | root: Boot 11.5.1.16900-16 Start
09/17/2019 17:33:02 | root: Shutdown 11.5.1.16900-16 Start
09/18/2019 09:55:23 | root: Boot 11.5.1.16900-16 Start
09/18/2019 10:45:52 | root: Shutdown 11.5.1.16900-16 Start
09/18/2019 14:45:45 | root: DRS Backup UCMVersion:11.5.1.16900-16/CUPVersion:11.5.1.16910-12 Start
09/18/2019 15:11:58 | root: DRS Backup UCMVersion:11.5.1.16900-16/CUPVersion:11.5.1.16910-12 Success
01/15/2020 12:21:06 | root: Cisco Option Install ciscocm.free_common_space_v1.5.cop Start
01/15/2020 12:21:14 | root: Cisco Option Install ciscocm.free_common_space_v1.5.cop Success
03/06/2021 10:49:26 | root: DRS Backup UCMVersion:11.5.1.16900-16/CUPVersion:11.5.1.16910-12 Start
03/06/2021 11:16:00 | root: DRS Backup UCMVersion:11.5.1.16900-16/CUPVersion:11.5.1.16910-12 Success
end of the file reached
options: q=quit, n=next, p=prev, b=begin, e=end (lines 61 - 65 of 65) :
admin:
Step 9. The next command “utils core active list” is used to identify any Linux server process error or issues. The command will give you any active core dumps by Linux server.
The command may be requested by Cisco TAC along with the output of “Utils core active analyze < core file name >” so they can backtrack the issue and find if there is any issue/bug which was hit and suggest next steps for remediation.
admin:utils core active list
Size Date Core File Name
=================================================================
291096 KB 2020-06-15 10:11:40 core.4335.11.cef.1592196098
247816 KB 2020-09-29 11:33:16 core.11201.11.cef.1601359394
329728 KB 2021-03-27 14:43:35 core.24693.11.cef.1616836414
186288 KB 2020-02-08 11:25:27 core.23353.11.cef.1581141326
We hope this article gives you an understanding, how to do a basic health check of CUCM. These steps will give you an idea of the problem for why the system is not running as expected or if there are any upcoming challenges in your system. The checks mentioned above will help you verify the system at the Virtual machine level, Network level, Service level, or OS level.
Are you looking for consulting, advisory and professional services to deploy a Collaboration Environment for your organization?
Zindagi Technologies Pvt. Ltd. is an IT consultancy and professional services organization based out of New Delhi, India. We have expertise in planning, designing, and deployment of collaboration environments, large-scale data centers, Private/Public/Hybrid cloud solutions. We believe in “Customer First” and provide quality services to our clients always. Call us on +919773973971.
Author
Rahul Bhukal
Sr. Collaboration Consultant