Saturday, February 8, 2014

Troubleshoot Hung Database

To troubleshoot hung database

Few Tools that can help us to troubleshoot such situation:
1. OSwatcher must be installed and running on the database server.
2. Hung analyzer in 9.0.1.0 onwards is a good tool to analyze where the problem lies.
 A database can be hung due to following:

1. A network or firewall issue where users are unable to login.
2. Listener Hang.
3. RAM shortage hung. like SWAP area full.
4. 100% CPU.
5. Data Hang due to blocks.
6. A bad disk.
7. OS limit exceeded.

Steps to be followed while sitting on this kind of situation:

1. As a Top down approach. Try to figure what are the OS statistics showing. Commands that can help us figure out are:
Ø  dmesg: check error logs on the server.
Ø  tail -1000f /var/adm/messages to check errors logs on the server.
Ø  vmstat and top provides you the clear picture what resources is being consumed heavily.
Ø  Netstat –a this will give you network statistics.

2. Check alertlog for any errors.

3. Try hung analyzer. Hung analyzer will generate trace files which will provide a quick view of hang chains occurring at the time of a hang being experienced.

Sometimes, database may actually just be very slow and not actually hanging. It is therefore recommended,  where possible to get 2 hanganalyze and 2 systemstate dumps in order to determine whether processes are moving at all or whether they are "frozen".

Steps to perform Hung analyzer:
Hanganalyze
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug hanganalyze 3
-- Wait one minute before getting the second hanganalyze
oradebug hanganalyze 3
oradebug tracefile_name
exit
Systemstate
sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug dump systemstate 266
oradebug dump systemstate 266
oradebug tracefile_name
exit


4. Check if there is any block or locks that are waiting. Remove the locks and blocks.
5. Run the AWR, ADDM and ASH report for the time period.

Once you are done to figure out what OS resource is exhausted and where the issue lies. You can try to figure out which pid is consuming the most of the resources. As per application design Kill those hung chain sessions.

If unable to do any of the following before rebouncing the database on server, run hung analyze in prelim session as mentioned below:

root> sqlplus –prelim
  1. SQL> oradebug hanganalyze 3
  2. Wait at least 2 minutes to give time to identify process state changes.
  3. SQL> oradebug hanganalyze 3
  4. Open a separate SQL session and immediately generate a system state dump.
  5. SQL> alter session set events 'immediate trace name SYSTEMSTATE level 10';

No comments:

Post a Comment

Oracle Enterprise Manager Cloud Control 13c (OMS upgrade from 13.3 or 13.4 to OMS 13.5)

 OMS upgrade from 13.3 or 13.4 to OMS 13.5 PHASE -I Planning 1. OMS 13.5 is the latest Oracle Cloud Control version available. 2. To directl...