Tuesday, January 28, 2014

JMAP, HISTO, Thread Dump, High CPU Utilization

Dear Reader,

In a production environment when we reach 100% or more than 100% CPU usage, developers and Production Support guys
have a nightmare to fix and handle the client calls. Java profiling help us to detect such CPU usage but not an 
option in production systems. Se we take thread dump, histo and use TDA or some other tools to pin point the faulty 
code implementation. This is really a panic scenario when you are handling production environment and issue comes at odd times.
This is written only for Unix environment (Ubuntu and Solaris) where these Unix command runs.

Fortunately, Java comes with some great debugging tools, We need to assemble those tools with Linux commands.

I am going to explain below items here:
    1) Introduction about Java threads and its relation to Linux LWP (Light Weight Process).
    2) Step-by-step process to take thread dump and analyze CPU utilization.
    3) "jmap" - Memory Map (dump), Command to get this.
    4) "jmap" - Histo, Command to get this.
    5) Command to see list of open files in Unix.
    6) Command to achieve the same (Resolving High CPU issue) in Sun Solaris Systems.


1) Introduction: A java program starts when JVM calls the main method, this creates a thread called the main thread and any 
    thread you create using java code will be derived from the main thread. The same exact behavior occurs at the Linux OS 
    level, the main thread for java means a Process for the OS and every thread you create using java the OS will create a 
    Light-weight-process or LWP. In short: Java main thread = Linux process and Java supporting threads = Linux LWP.
    LWP we give an alias name as Native ID.

    The solution requies:
        Ask Linux which LWP is eating the CPU.
        Ask Java for a Thread Dump.
        Map this LWP to a Java thread.
        Get the part of code causing the issue.    
2) Step-by-step process:    
        Get the PID: the very first step is to know what is the Java process ID, we will use Linux commands 
        as below. Use ONE OF THE BELOW commands (we use our grep "DAPPNAME", you can use anything like "grep java"):
        jps -v | cut -c -106 | grep DAPPNAME
        jps -mvl | cut -c -106 | grep DAPPNAME
        ps -eaf | cut -c -106 | grep DAPPNAME
        ps -ef | cut -c -106 | grep DAPPNAME
        ps -eaf | grep java
        Below are the sample output when you execute the command:
        dmodi@MyDirectory:~$ jps -mlv | cut -c -106 | grep DAPPNAME
        8243 org.quickserver.net.server.QuickServer -load config/DmodiServer.xml -DAPPNAME=CLIENT    
        13712 org.quickserver.net.server.QuickServer -load ./conf/DmodiDNXServer.xml -DAPPNAME=SERVER
        12229 org.quickserver.net.server.QuickServer -load ./config/DmodiPOSServer.xml -DAPPNAME=SERVER2
        Explanation: "jps" - Java Virtual Machine Process Status Tool, a command in Unix. "106" shows 106 
        characters we want to display in console.
        The next step is to get the CPU usage per each LWP related to the main process, we can use below commands:
        //Replace PROCESS_ID with numeric process_id you get after using "top" command in Unix, which is having more than 100% 
        CPU usage.
        ps -eLo pid,lwp,nlwp,ruser,pcpu,stime,etime,args | grep PROCESS_ID | cut -c -106 > ThreadsList.txt
        The newly created file ThreadsList.txt will contain below things in similar way (The headers will not be 
        shown as below):        
        8243  8243 3  dmodi  0.0 May13  1-19:20:18 java -Dprogram.name=run.sh -Xms64m -Xmx100m -Dsun.rmi.dgc
        8243  8244 3  dmodi  0.0 May13  1-19:20:18 java -Dprogram.name=run.sh -Xms64m -Xmx100m -Dsun.rmi.dgc
        8243  8245 3  dmodi 99.9 May13  1-19:20:18 java -Dprogram.name=run.sh -Xms64m -Xmx100m -Dsun.rmi.dgc
         To see headers too, just execute the below command (but it will display all processes, Process_Id is not entered):
         ps -eLo pid,lwp,nlwp,ruser,pcpu,stime,etime,args > ThreadsList.txt

         Explanation: PID is process Id.
            LWP: is Light weight processes Lists (java Threads for given ProcessId) for the above PID. These values are in Decimal.
            NLWP: is number of LWP created for the above PID.            
            We can see LWP (Thread) - 8245 is eating CPU. We need to convert this value into HEX value which will be "2035".
        Now take the thread dump and kill the process Id: 8243. 
        See below command:
        //Taking thread dump
        jstack -l 8243 > ThreadDump_15_May_2014_13_PM.txt
//Killing process kill -9 8243 kill -3 8243 (Can also be used in Ubuntu. If used for Solaris, It will generate Thread dump file along with killing).
Open the thread dump file ThreadDump_15_May_2014_13_PM.txt and search the hexa value "2035". Also if you are using TDA (Thread dump analyzer tool) to see this dump file, you can see Native-ID column. You can see the decimal thread Id (LWP): 8245 too. Click the link in TDA for this ThreadId, it will display the faulty code in TDA console.. 3) "jmap" - Memory Map (dump): Prints shared object memory maps or heap memory details of a given JVM process. dmodiUnixUser@productName79:~$ jmap -dump:file=deepak.bin 8243 Dumping heap to /home/dmodiUnixUser/deepak.bin ... Heap dump file created dmodiUnixUser@productName79:~$ ls deepak.bin This newly created file will be big in size (of 5-10 MB around). You can't see this content using "less" or "cat" command. You need tool to see this. We don't use this generally, so not mentioning here. 4) "jmap" - Histo: See below command: dmodiUnixUser@productName79:~$ jmap -histo:live 8243 > deepak.txt Contents of this file "deepak.txt" will have similar like above: num #instances #bytes class name ---------------------------------------------- 1: 14452 2229096 <constMethodKlass> 2: 14452 1740720 <methodKlass> 3: 1004 1406296 <constantPoolKlass> 4: 1336 1270504 [B 5: 25057 1060840 <symbolKlass> 6: 835 809368 <constantPoolCacheKlass> 7: 1004 787096 <instanceKlassKlass> 5) List of open files in Linux: lsof - list open files dmodiUnixUser@productName79:~$ lsof | grep home/dmodi/productName/dist/ COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 16460 dmodiUnixUser mem REG 9,2 25680 8127062 home/dmodi/productName/dist/sample1.jar java 16460 dmodiUnixUser mem REG 9,2 66770 8127061 home/dmodi/productName/dist/sample2.jar dmodiUnixUser@productName79:~$ lsof | grep PROCESS_ID > help.txt dmodiUnixUser@productName79:~$ less help.txt 6) The Above mentioned few commands may not work for Sun Solaris system. Hence to track high CPU consuming Process and ThreadId, "prstat" is used. The syntax is "prstat -L -p ". Example: prstat -L -p 22991 >> 22991.txt This will generate file name 22991.txt having all the Threads and CPU Usage details for ProcessId 22991. The headers will be like this : PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID 22991 root 7667M 4953M cpu18 0 0 14:47:00 99.6% java/119286 22991 root 7667M 4953M sleep 59 0 0:02:14 0.0% java/46 22991 root 7667M 4953M sleep 59 0 0:00:15 0.0% java/15 22991 root 7667M 4953M sleep 59 0 0:00:15 0.0% java/14 Here "PID corresponds to Solaris Java process ID". "CPU corresponds to the CPU utilization % of a particular Java Thread". "PROCESS/LWPID corresponds to Light Weight Process ID e.g. your native Java ThreadID belonging to ProcessID 22991". HERE prstat says, ThreadId #119286 is the top CPU contributor with 99.6% utilization and hence the faulty code must be fixed. Hence take thread dump immediately using command "kill -3 22991". This command will generate a Thread Dump from Java process HotSpot VM format. Convert Thread ID #119286 which is in decimal format to HEXA, corresponds value is 0X1D1F6 (this HEXA format, see 0X prefix). HEXA value is "1D1F6". Now search this HEXA value in Thread Dump file, you will get the exactly faulty code stacktrace. ----------------------------------END-----------------------------------------

No comments:

Post a Comment