Friday, November 13, 2015

Garbage Collection and Types of Garbage Collectors in Java

Points wrote:

Garbage Collection in Java
Types of Garbage Collectors
Garbage Collection Process
Promotion Failure
Java Application Monitoring and Profiling


Garbage Collection in Java:
Garbage collection is an automatic memory management feature. It is one of the finest achievements of Java programming 
language. It allows developers to create new objects without worrying explicitly about memory allocation and de-allocation, 
because the garbage collector automatically reclaims memory for reuse. This enables faster development while eliminating 
memory leaks and other memory-related problems. JVM internally uses a daemon thread called “Garbage Collector” to do the same. 
Daemon thread is a low priority service thread, hence its execution can't be guaranteed. Please note Garbage collection is 
only for HEAP Memory, not for Method Area and Stack Memory.

When a typical Java application is running, it is creating new objects, such as Strings and Files, but after a certain time, 
those objects are not used anymore. For example, take a look at the following code:

for(File f : files) {
    String s = f.getName();
}

In the above code, the String s is being created on each iteration of the for loop. This means that in every iteration, a 
little bit of memory is being allocated to make a String object. Going back to the code, we can see that once a single 
iteration is executed, in the next iteration, the String object that was created in the previous iteration is not being 
used anymore - that object is now considered "garbage". Eventually, we'll start getting a lot of garbage, and memory will 
be used for objects which aren't being used anymore. If this keeps going on, eventually the Java Virtual Machine will run 
out of space to make new objects. The garbage collector will look for objects which aren't being used anymore, and gets 
rid of them, freeing up the memory so other new objects can use that piece of memory.

Types of Garbage Collectors:
Java has 5 types of garbage collectors:
1. Serial Collector (-XX:+UseSerialGC)
2. Parallel or Throughput Collector (-XX:+UseParallelGC)
3. Concurrent Mark and Sweep (CMS) Collector (-XX:+UseConcMarkSweepGC or -XX:ParallelCMSThreads= 2)
4. G1 Collector (-XX:+UseG1GC)
5. Incremental Collector (-Xincgc)

NOTE: Serial and Parallel collectors are also called “Copy Collectors”. CMS, G1, Concurrent, Incremental collectors are 
used for Old Generation Heap. Copy Collectors are used for Young-generation Heap.

Each of these types has its own advantages and disadvantages. Most importantly, we the programmers can choose the type of 
garbage collector to be used by the JVM. We can choose them by passing the choice as JVM argument. 

1. Serial Garbage Collector
Serial garbage collector works by holding all the application threads. It is designed for the single-threaded environments. 
It uses just a single thread for garbage collection. It freezes all the application threads (Stop-the-world) while doing 
garbage collection. It is best suited for simple command-line programs. 

2. Parallel or Throughput Garbage Collector
Parallel garbage collector is also called as Throughput collector. It is the default garbage collector of the JVM. Unlike 
serial garbage collector, this uses multiple threads for garbage collection. Similar to serial garbage collector this also 
freezes all the application threads while performing garbage collection. Parallel Collectors use multiple threads to 
parallelize the collection and hence shorten the time taken on multiple-CPU machines.

3. CMS Garbage Collector
Concurrent Mark Sweep (CMS) garbage collector uses multiple threads to scan the heap memory to mark instances for eviction 
and then sweep the marked instances. CMS garbage collector holds all the application threads in the following two scenarios only,
    1. While marking the referenced objects in the tenured generation space.
    2. If there is a change in heap memory in parallel while doing the garbage collection.
In comparison with parallel garbage collector, CMS collector uses more CPU to ensure better application throughput. If we can 
allocate more CPU for better performance then CMS garbage collector is the preferred choice over the parallel collector. CMS 
garbage collector compacts the memory on stop the world (STW) situations post memory reclaim. This uses  “stop-the-world” 
Mark-and-sweep collection algorithm. The collector is single-threaded, the entire JVM is paused and the collector uses only 
one CPU until completed. More details are below:

This collector tries to allow application processing to continue as much as possible during the collection. 
Splitting the collection into six phases described shortly, four are concurrent while two are stop-the-world:
    1. The initial-mark phase (stop-the-world, snapshot the old generation so that we can run most of the rest 
       of the collection concurrent to the application threads);
    2. The mark phase (concurrent, mark the live objects traversing the object graph from the roots);
    3. The pre-cleaning phase (concurrent);
    4. The re-mark phase (stop-the-world, another snapshot to capture any changes to live objects since the collection started);
    5. The sweep phase (concurrent, recycles memory by clearing unreferenced objects);
    6. The reset phase (concurrent).

If "the rate of creation" of objects is too high, and the concurrent collector is not able to keep up with the concurrent 
collection, it falls back to the traditional mark-sweep collector.

4. G1 Garbage Collector
G1 garbage collector is used for large heap memory areas. It separates the heap memory into regions and does collection 
within them in parallel. G1 also does compacts the free heap space on the go just after reclaiming the memory. G1 
collector prioritizes the region based on most garbage first. 
  
5. Incremental Collector
The incremental collector uses a "train" algorithm to collect small portions of the old generation at a time. This collector 
has higher overheads than the Mark-Sweep Collector, but because small numbers of objects are collected each time, the 
(stop-the-world) garbage collection pause is minimized at the cost of total garbage collection taking longer. The "train" 
algorithm does not guarantee a maximum pause time, but pause times are typically less than ten milliseconds.


Garbage Collection Process:

Analysis of object life cycles in many object-oriented programs shows that most objects tend to have very short lifetimes, 
with fewer objects having intermediate length lives and some objects being very long-lived. 

Garbage collection of short-lived objects can be achieved efficiently using a Copying collector, whereas a Mark-and-sweep 
collector is more useful for the Full Heap because this collector avoids object leaks. In their most basic terms, a Copying 
collector copies all live objects from area1 to area2, which then leaves area1 free to reuse for new objects or the next Copy 
collection. A Mark-and-sweep collector finds all objects that can be reached from the JVM roots by traversing all object nodes 
(instance variables and array elements), marking all reached objects as "alive", then sweeping away all remaining objects (the 
dead objects). Copy collection time is roughly proportional to the number of live objects, Mark-and-sweep collection is roughly 
proportional to the size of the heap.

So the heap is split into the young generation and the old generation so that a Copying collection algorithm can be used in the 
young generation and a Mark-and-Sweep collection algorithm can be used in the old generation.

Objects are created in the young generation, most live and die in that heap space and are efficiently collected without forcing 
a full Mark-and-sweep collection. Some objects get moved over to the old generation because they live too long, and if the old 
generation gets full enough, a Mark-and-sweep collection must run.

Promotion Failure:
Sometime, GC gives Promotion failed error. Promotion failure usually means that, when trying to promote an object from the young 
generation into the old generation, the old generation was so fragmented that there was not enough contiguous memory available 
to store the object. As the CMS garbage collector doesn’t de-fragment the old generation in any way, it had to resort to its 
fallback, a single-threaded Full GC. This single-threaded Full GC will take more time. In general, the recommended approach to 
handling promotion failures is to increase the size of the heap. Try with a bigger heap and see if the problem persists.

Java Application Monitoring:
Monitoring: Extracting high level statistics from a running application. Java comes with built-in tools for this as 
Jconsole and jvmstat.

Profiling: 
Instrumenting an application to provide detailed performance statistics. Java comes with built-in tools for this 
as “-Xprof” and “hprof” Profiler. Also Java Heap Analysis Tool (JHAT) can be used. A profiler based on JFluid Technology has 
been incorporated into the popular NetBeans development tool.

No comments:

Post a Comment