Dear Reader,
In a production environment when we reach 100% or more than 100% CPU usage, developers and Production Support guys
have a nightmare to fix and handle the client calls. Java profiling help us to detect such CPU usage but not an
option in production systems. Se we take thread dump, histo and use TDA or some other tools to pin point the faulty
code implementation. This is really a panic scenario when you are handling production environment and issue comes at odd times.
This is written only for Unix environment (Ubuntu and Solaris) where these Unix command runs.
Fortunately, Java comes with some great debugging tools, We need to assemble those tools with Linux commands.
I am going to explain below items here:
1) Introduction about Java threads and its relation to Linux LWP (Light Weight Process).
2) Step-by-step process to take thread dump and analyze CPU utilization.
3) "jmap" - Memory Map (dump), Command to get this.
4) "jmap" - Histo, Command to get this.
5) Command to see list of open files in Unix.
6) Command to achieve the same (Resolving High CPU issue) in Sun Solaris Systems.
1) Introduction: A java program starts when JVM calls the main method, this creates a thread called the main thread and any
thread you create using java code will be derived from the main thread. The same exact behavior occurs at the Linux OS
level, the main thread for java means a Process for the OS and every thread you create using java the OS will create a
Light-weight-process or LWP. In short: Java main thread = Linux process and Java supporting threads = Linux LWP.
LWP we give an alias name as Native ID.
The solution requies:
Ask Linux which LWP is eating the CPU.
Ask Java for a Thread Dump.
Map this LWP to a Java thread.
Get the part of code causing the issue.
2) Step-by-step process:
Get the PID: the very first step is to know what is the Java process ID, we will use Linux commands
as below. Use ONE OF THE BELOW commands (we use our grep "DAPPNAME", you can use anything like "grep java"):
jps -v | cut -c -106 | grep DAPPNAME
jps -mvl | cut -c -106 | grep DAPPNAME
ps -eaf | cut -c -106 | grep DAPPNAME
ps -ef | cut -c -106 | grep DAPPNAME
ps -eaf | grep java
Below are the sample output when you execute the command:
dmodi@MyDirectory:~$ jps -mlv | cut -c -106 | grep DAPPNAME
8243 org.quickserver.net.server.QuickServer -load config/DmodiServer.xml -DAPPNAME=CLIENT
13712 org.quickserver.net.server.QuickServer -load ./conf/DmodiDNXServer.xml -DAPPNAME=SERVER
12229 org.quickserver.net.server.QuickServer -load ./config/DmodiPOSServer.xml -DAPPNAME=SERVER2
Explanation: "jps" - Java Virtual Machine Process Status Tool, a command in Unix. "106" shows 106
characters we want to display in console.
The next step is to get the CPU usage per each LWP related to the main process, we can use below commands:
//Replace PROCESS_ID with numeric process_id you get after using "top" command in Unix, which is having more than 100%
CPU usage.
ps -eLo pid,lwp,nlwp,ruser,pcpu,stime,etime,args | grep PROCESS_ID | cut -c -106 > ThreadsList.txt
The newly created file ThreadsList.txt will contain below things in similar way (The headers will not be
shown as below):
PID LWP NLWP RUSER %CPU STIME ELAPSED COMMAND
8243 8243 3 dmodi 0.0 May13 1-19:20:18 java -Dprogram.name=run.sh -Xms64m -Xmx100m -Dsun.rmi.dgc
8243 8244 3 dmodi 0.0 May13 1-19:20:18 java -Dprogram.name=run.sh -Xms64m -Xmx100m -Dsun.rmi.dgc
8243 8245 3 dmodi 99.9 May13 1-19:20:18 java -Dprogram.name=run.sh -Xms64m -Xmx100m -Dsun.rmi.dgc
To see headers too, just execute the below command (but it will display all processes, Process_Id is not entered):
ps -eLo pid,lwp,nlwp,ruser,pcpu,stime,etime,args > ThreadsList.txt
Explanation: PID is process Id.
LWP: is Light weight processes Lists (java Threads for given ProcessId) for the above PID. These values are in Decimal.
NLWP: is number of LWP created for the above PID.
We can see LWP (Thread) - 8245 is eating CPU. We need to convert this value into HEX value which will be "2035".
Now take the thread dump and kill the process Id: 8243.
See below command:
//Taking thread dump
jstack -l 8243 > ThreadDump_15_May_2014_13_PM.txt
//Killing process
kill -9 8243
kill -3 8243 (Can also be used in Ubuntu. If used for Solaris, It will generate Thread dump file along with killing).
Open the thread dump file ThreadDump_15_May_2014_13_PM.txt and search the hexa value "2035".
Also if you are using TDA (Thread dump analyzer tool) to see this dump file, you can see Native-ID column. You can see the
decimal thread Id (LWP): 8245 too. Click the link in TDA for this ThreadId, it will display the faulty code in TDA console..
3) "jmap" - Memory Map (dump): Prints shared object memory maps or heap memory details of a given JVM process.
dmodiUnixUser@productName79:~$ jmap -dump:file=deepak.bin 8243
Dumping heap to /home/dmodiUnixUser/deepak.bin ...
Heap dump file created
dmodiUnixUser@productName79:~$ ls
deepak.bin
This newly created file will be big in size (of 5-10 MB around). You can't see this content using
"less" or "cat" command. You need tool to see this. We don't use this generally, so not mentioning here.
4) "jmap" - Histo: See below command:
dmodiUnixUser@productName79:~$ jmap -histo:live 8243 > deepak.txt
Contents of this file "deepak.txt" will have similar like above:
num #instances #bytes class name
----------------------------------------------
1: 14452 2229096 <constMethodKlass>
2: 14452 1740720 <methodKlass>
3: 1004 1406296 <constantPoolKlass>
4: 1336 1270504 [B
5: 25057 1060840 <symbolKlass>
6: 835 809368 <constantPoolCacheKlass>
7: 1004 787096 <instanceKlassKlass>
5) List of open files in Linux:
lsof - list open files
dmodiUnixUser@productName79:~$ lsof | grep home/dmodi/productName/dist/
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 16460 dmodiUnixUser mem REG 9,2 25680 8127062 home/dmodi/productName/dist/sample1.jar
java 16460 dmodiUnixUser mem REG 9,2 66770 8127061 home/dmodi/productName/dist/sample2.jar
dmodiUnixUser@productName79:~$ lsof | grep PROCESS_ID > help.txt
dmodiUnixUser@productName79:~$ less help.txt
6) The Above mentioned few commands may not work for Sun Solaris system. Hence to track high CPU consuming Process and ThreadId,
"prstat" is used. The syntax is "prstat -L -p ".
Example: prstat -L -p 22991 >> 22991.txt
This will generate file name 22991.txt having all the Threads and CPU Usage details for ProcessId 22991.
The headers will be like this :
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
22991 root 7667M 4953M cpu18 0 0 14:47:00 99.6% java/119286
22991 root 7667M 4953M sleep 59 0 0:02:14 0.0% java/46
22991 root 7667M 4953M sleep 59 0 0:00:15 0.0% java/15
22991 root 7667M 4953M sleep 59 0 0:00:15 0.0% java/14
Here "PID corresponds to Solaris Java process ID".
"CPU corresponds to the CPU utilization % of a particular Java Thread".
"PROCESS/LWPID corresponds to Light Weight Process ID e.g. your native Java ThreadID belonging to ProcessID 22991".
HERE prstat says, ThreadId #119286 is the top CPU contributor with 99.6% utilization and hence the faulty code must be fixed.
Hence take thread dump immediately using command "kill -3 22991". This command will generate a Thread Dump from Java process HotSpot VM format.
Convert Thread ID #119286 which is in decimal format to HEXA, corresponds value is 0X1D1F6 (this HEXA format, see 0X prefix).
HEXA value is "1D1F6". Now search this HEXA value in Thread Dump file, you will get the exactly faulty code stacktrace.
----------------------------------END-----------------------------------------
Tuesday, January 28, 2014
JMAP, HISTO, Thread Dump, High CPU Utilization
Wednesday, January 22, 2014
Execution Plan of a Query in Oracle
Working with Oracle in Unix and getting Execution Plan of a Query:
1) Login to Unix Machine where Oracle is installed.
2) You need to check whether Oracle is running (Listener service is running). To check this type below command:
oracle@companyName:~$ lsnrctl status (press ENTER)
This will give around 30-Lines output, at the end, it will show whether this running/ready or not.
However if it is not running, start it using below command:
oracle@companyName:~$ lsnrctl start (press ENTER)
3) Type below command in Unix (this will set Oracle_Home path and export it):
oracle@companyName:~$ export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1 (press ENTER)
oracle@companyName:~$ export ORACLE_SID=orcl (press ENTER)
oracle@companyName:~$ export PATH=$ORACLE_HOME/bin:$ORACLE_HOME:$PATH:. (press ENTER)
4) Now you need to login to Database, so type below command:
oracle@companyName:~$ sqlplus (press ENTER)
Enter user-name: YOUR_ORACLE_USER_NAME (press ENTER)
Enter password: YOUR_ORACLE_PASSWORD (press ENTER)
SQL> (You are logged in now...)
/*
For Logging as SYS DBA, required only for DBA activities, don't run oftenly:
oracle@companyName:~$ sqlplus / as sysdba
SQL>
SQL> exit
*/
5) Set Linesize and Pagesize as Unix(Oracle) will show fragmented pages. Type below commands:
SQL> set linesize 300;
SQL> set pagesize 2000;
6) Now to print Execution Plan of a Query to know CPU Utilization, IO Cost, Bytes Read, Index Scan etc, use below queries:
SQL> SET autotrace ON;
SQL> SELECT /*+ index(TABLE_NAME INDEX_TIME_STAMP) */ * FROM TABLE_NAME WHERE
TIME_STAMP>=To_Date('10/01/2014 00:04:20','dd/mm/yyyy hh24:mi:ss') AND TIME_STAMP<=To_Date('15/01/2014 17:04:20','dd/mm/yyyy hh24:mi:ss')
ORDER BY CARD_NO ASC;
Your output will look like below:
ROW DATA 1........
ROW DATA 2........
ROW DATA 3........
3 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1246977483
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 18 | 4122 | 4 (25)| 00:00:01 |
| 1 | SORT ORDER BY | | 18 | 4122 | 4 (25)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TABLE_NAME | 18 | 4122 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | INDEX_TIME_STAMP | 18 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("TIME_STAMP">=TIMESTAMP' 2014-01-10 00:04:20' AND "TIME_STAMP"<=TIMESTAMP'
2014-01-15 17:04:20')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
4397 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
10 rows processed
SQL> exit
7) Now the same above query output, can be achieved in ORACLE related UI interfaces like below
(above all execution happened in UNIX Black color command line interface), execution below 2 queries in UI interface like SQL Tools:
explain plan for
SELECT /*+ index(TABLE_NAME INDEX_TIME_STAMP) */ * FROM TABLE_NAME WHERE
TIME_STAMP>=To_Date('10/01/2014 00:04:20','dd/mm/yyyy hh24:mi:ss') AND TIME_STAMP<=To_Date('15/01/2014 17:04:20','dd/mm/yyyy hh24:mi:ss')
ORDER BY CARD_NO ASC;
SELECT PLAN_ID,CPU_COST,IO_COST,TIME,BYTES,COST,DEPTH,OPTIMIZER,OBJECT_TYPE,OBJECT_NAME,OBJECT_OWNER,OPTIONS,OPERATION,
TIMESTAMP FROM PLAN_TABLE;
==========================================END==============================================
Subscribe to:
Posts (Atom)