Wednesday, March 7, 2012

How HashMap works in Java

Dear reader, 
I have mentioned below topics in this blog article:
1) How HashMap works in Java.
2) Difference between HashMap and HashTable. 
3) At the end few programs having implementation of equals() and hashCode() method
   and impact of these methods on HashMap storage.

How HashMap works in Java is a very common question. Almost everybody who worked 
in Java knows what hashMap is, where to use hashMap or difference between 
hashTable and HashMap, then why this interview question becomes so special? Because of the breadth and depth 
this question offers. It has become very popular java interview question in almost any senior or mid-senior 
level java interviews.

Questions start with simple statement:
"Have you used HashMap before" or "What is HashMap? Why do we use it“
Almost everybody answers this with yes and then interviewee keep talking about common facts about HashMap like 
HashMap accept null while hashTable doesn't, HashMap is not synchronized, hashMap is fast and so on along with 
basics like its stores key and value pairs etc.
This shows that person has used HashMap and quite familiar with the functionality HashMap offers but interview 
takes a sharp turn from here and next set of follow up questions gets more detailed about fundamentals involved 
in HashMap. Interview here you and come back with questions like

"Do you Know how HashMap works in Java” or "How does get () method of HashMap works in Java"
And then you get answers like I don't bother its standard Java API, you better look code on java; I can find 
it out in Google at any time etc.
But some interviewee definitely answer this and will say "HashMap works on principle of hashing, we have put() 
and get() method for storing and retrieving data from HashMap. When we pass an object to put () method to store 
it on HashMap, HashMap implementation calls hashcode() method HashMap key object and by applying that hashcode on 
its own hashing function it identifies a bucket location for storing value object , important part here is HashMap 
stores both key+value in bucket which is essential 
to understand the retrieving logic. if people fails to recognize this and say it only stores Value in the bucket they 
will fail to explain the retrieving logic of any object stored in HashMap . This answer is very much acceptable and 
does make sense that interviewee has fair bit of knowledge how hashing works and how HashMap works in Java.
But this is just start of story and going forward when depth increases a little bit and when you put interviewee on 
scenarios every java developers faced day by day basis. So next question would be more likely about collision 
detection and collision resolution in Java HashMap e.g 

"What will happen if two different objects have same hashCode?”
Now from here confusion starts some time interviewer will say that since hashCode is equal objects are equal and 
HashMap will throw exception or not store it again etc. then you might want to remind them about equals and hashCode() 
contract that two unequal object in Java very much can have equal hashCode. Some will give up at this point and some 
will move ahead and say "Since hashCode () is same, bucket location would be same and collision occurs in HashMap, 
Since HashMap uses a linked list to store in bucket, value object will be stored in next node of linked list." 
Great this answer make sense to me though there could be some other collision resolution methods available this is 
simplest and HashMap does follow this. See the program at the end of this article.

But story does not end here and final questions interviewer ask like:
"How will you retrieve if two different objects have same hashcode?”
Hmmmmmmmmmmm
Interviewee will say we will call get() method and then HashMap uses keys hashCode to find out bucket location and 
retrieves object but then you need to remind him that there are two objects are stored in same bucket , so they will 
say about traversal in linked list until we find the value object , then you ask how do you identify value object 
because you don't value object to compare ,So until they know that HashMap stores both Key and Value in linked list 
node they won't be able to resolve this issue and will try and fail.

But those bunch of people who remember this key information will say that after finding bucket location , we will call 
keys.equals() method to identify correct node in linked list and return associated value object for that key in Java 
HashMap. Perfect this is the correct answer.

In many cases interviewee fails at this stage because they get confused between hashcode() and equals() and keys and 
values object in hashMap which is pretty obvious because they are dealing with the hashcode() in all previous questions 
and equals() come in picture only in case of retrieving value object from HashMap.
Some good developer point out here that using immutable, final object with proper equals() and hashcode() 
implementation would act as perfect Java HashMap keys and improve performance of Java hashMap by reducing collision. 
Immutability also allows caching there hashcode of different keys which makes overall retrieval process very fast and 
suggest that String and various wrapper classes e.g Integer provided by Java Collection API are very good HashMap keys.

Now if you clear all this java hashmap interview question you will be surprised by this very interesting question 
"What happens On HashMap in Java if the size of the Hashmap exceeds a given threshold defined by load factor ?". 
Until you know how hashmap works exactly you won't be able to answer this question.
if the size of the map exceeds a given threshold defined by load-factor e.g. if load factor is .75 it will act to 
re-size the map once it filled 75%. Java Hashmap does that by creating another new bucket array of size twice of 
previous size of hashmap, and then start putting every old element into that new bucket array and this process is 
called rehashing because it also applies hash function to find new bucket location. 

If you manage to answer this question on hashmap in java you will be greeted by "do you see any problem with resizing 
of hashmap in Java", you might not be able to pick the context and then he will try to give you hint about multiple 
thread accessing the java hashmap and potentially looking for race condition on HashMap in Java. 

So the answer is Yes there is potential race condition exists while resizing hashmap in Java, if two thread at the same 
time found that now Java Hashmap needs resizing and they both try to resizing. on the process of resizing of hashmap in 
Java, the element in bucket which is stored in linked list get reversed in order during there migration to new bucket 
because java hashmap doesn't append the new element at tail instead it append new element at head to avoid tail traversing. 
If race condition happens then you will end up with an infinite loop. though this point you can potentially argue that 
what the hell makes you think to use HashMap in multi-threaded environment to interviewer :). 
Never use it in multithreaded env. For MultiThreaded use ConcurrentHashMap.

I like this question because of its depth and number of concept it touches indirectly, if you look at questions asked 
during interview this HashMap questions has verified:
    Concept of hashing
    Collision resolution in HashMap
    Use of equals () and hashCode () method and there importance?
    Benefit of immutable object?
    race condition on hashmap in Java
    Resizing of Java HashMap

Just to summarize here are the answers which does makes sense for above questions:
How HashMAp works in Java
HashMap works on principle of hashing, we have put() and get() method for storing and retrieving object form hashMap.
When we pass an both key and value to put() method to store on HashMap, it uses key object hashcode() method to calculate 
hashcode and they by applying hashing on that hashcode it identifies bucket location for storing value object.
While retrieving it uses key object's equals method to find out correct key value pair and return value object associated 
with that key. HashMap uses linked list in case of collision and object will be stored in next node of linked list.
Also hashMap stores both key+value tuple in every node of linked list.

What will happen if two different HashMap key objects have same hashcode?
They will be stored in same bucket but no next node of linked list. And keys equals() method will be used to identify 
correct key value pair in HashMap.

In terms of usage HashMap is very versatile and can be mostly used hashMap as cache in electronic trading application. 
Since finance domain used Java heavily and due to performance reason we need caching a lot, HashMap comes very handy there.

For new insertions, If key is same i.e. equals by equals method in Java than hashcode would be same and value will be 
replaced but if key is not same i.e. not equal but hashcode is same (Which is possible in java) than bucked location 
would be same and collision would happen and second object will be stored on second node of linked list structure 
on Map. 

So key.equals() is used on both put() and get() method call if object already exits in bucked location on hashmap 
and that's why tuples contains both key and value in each node. In Java Object helpfully provides hashCode() and equals() 
so we know that any object will be usable as a key in a hashtable.

So the known risks when using HashMap data structure in multithreaded env are: HashMap internal index corruption 
and infinite looping, which can bring your JVM to its knees.

What JDK Map data structure is more suitable to handle concurrent read & write operations in a Java EE environment?
The answer is to use the ConcurrentHashMap, introduced in JDK 1.5, which provides Thread safe operations but no 
blocking get() operation; which is key for proper performance. This is the typical data structure used these 
days in modern Java EE container implementations.

JDK 1.5 introduce some good concurrent collections which is highly efficient for high volume, low latency system.

The synchronized collections classes, Hashtable and Vector, and the synchronized wrapper classes, 
Collections.synchronizedMap and Collections.synchronizedList, provide a basic conditionally thread-safe implementation 
of Map and List.
However, several factors make them unsuitable for use in highly concurrent applications -- their single collection-wide 
lock is an impediment to scalability and it often becomes necessary to lock a collection for a considerable time during 
iteration to prevent ConcurrentModificationExceptions.

The ConcurrentHashMap and CopyOnWriteArrayList implementations provide much higher concurrency while preserving thread 
safety, with some minor compromises in their promises to callers. ConcurrentHashMap and CopyOnWriteArrayList are not 
necessarily useful everywhere you might use HashMap or ArrayList, but are designed to optimize specific common situations. 
Many concurrent applications will benefit from their use.

So what is the difference between hashtable and ConcurrentHashMap , both can be used in multithreaded environment but 
once the size of hashtable becomes considerable large performance degrade because for iteration it has to be locked 
for longer duration.

Since ConcurrentHashMap indroduced concept of segmentation , how large it becomes only certain part of it get locked 
to provide thread safety so many other readers can still access map without waiting for iteration to complete.

In Summary ConcurrentHashMap only locked certain portion of Map while Hashtable lock full map while doing iteration.

HashMap can be synchronized by:
 Map m = Collections.synchronizeMap(hashMap);
 
Difference between HashMap and HashTable? 
1. The HashMap class is roughly equivalent to Hashtable, except that it is non synchronized and permits nulls. 
   (HashMap allows null values as key and value whereas Hashtable doesn't allow nulls).
2. HashMap does not guarantee that the order of the map will remain constant over time.
3. HashMap is non synchronized whereas Hashtable is synchronized.
4. Iterator in the HashMap is  fail-fast  while the enumerator for the Hashtable is not and throw 
ConcurrentModificationException if any other Thread modifies the map structurally  by adding or removing any 
element except Iterator's own remove()  method. But this is not a guaranteed behavior and will be done by 
JVM on best effort.
5) HashTable extends Dictionary interface which is quite old while hashmap extends Map interface.
6) hashtalbe doesn't have counterpart like ConcurrentHashMap.

Note on Some Important Terms:
1)Synchronized means only one thread can modify a hashtable at one point of time. Basically, it means that 
any thread before performing an update on a hashtable will have to acquire a lock on the object while others 
will wait for lock to be released.

2)Fail-safe is relevant from the context of iterators. If an iterator has been created on a collection object 
and some other thread tries to modify the collection object "structurally", a concurrent modification exception
will be thrown. It is possible for other threads though to invoke "set" method since it doesn't modify the 
collection "structurally". However, if prior to calling "set", the collection has been modified structurally, 
"IllegalArgumentException" will be thrown.

3)Structurally modification means deleting or inserting element which could effectively change the structure 
of map.


=================Other important points about use of Map============
1) If the Class's object which is getting stored in Map, overrites equals() and hashCode() method and
   hashCode() returns a fixed value like "45" or "10" and equals() return "true" for all cases as explained 
   in below coding then Map objects will be considered as Duplicate, so only one Pair will be shown as output.
   Check the MainMap.java program below:

//Person.java (overriding equals() and hashCode() method
public final class Person {
    final String name;
    public Person(String name){
        this.name=name;
    }
    @Override
    public boolean equals(Object obj) {
        //return super.equals(obj);
        return true;
    }
    @Override
    public int hashCode() {
        return 45;
    }
}

===========
//MainMap.java
import java.util.HashMap;
import java.util.HashSet;
import java.util.Set;

public class MainMap {
    public static void main(String[] args) {
        Person a=new Person("John");
        Person b=new Person("Deepak");
        Person c=new Person("John");
        Person d=new Person("John");

        System.out.println("a.hashCode():"+a.hashCode());
        System.out.println("b.hashCode():"+b.hashCode());
        System.out.println("c.hashCode():"+c.hashCode());
        System.out.println("d.hashCode():"+d.hashCode());

        HashMap map=new HashMap();
        map.put(a,"John");
        map.put(b, "Deepak");
        map.put(c,"John");
        System.out.println("Map contents: "+map);

        System.out.println("Trying to get object \"d\" : "+map.get(d));

        Set s=new HashSet();
        s.add(a);
        s.add(b);
        s.add(c);
        System.out.println("Set containing Person objects :"+s);
        //System.out.println(s.get(d)); //get method is not allowed in Set.
        System.out.println("Does Set contains Person d object \"d\": "+s.contains(d));  //false

        s.clear();
        Integer intObj1=new Integer("0");
        Integer intObj2=new Integer("0");
        s.add(intObj1);
        s.add(intObj2);
        System.out.println("Set containing Integer objects :"+s);

        s.clear();
        String stringObj1=new String("0");
        String stringObj2=new String("0");
        s.add(stringObj1);
        s.add(stringObj2);
        System.out.println("Set containing String objects :"+s);


        s.clear();
        StringBuffer stringBuffObj1=new StringBuffer("0");
        StringBuffer stringBuffObj2=new StringBuffer("0");
        s.add(stringBuffObj1);
        s.add(stringBuffObj2);
        System.out.println("Set containing StringBuffer objects :"+s);
        
        s.clear();
        Thread threadObj1=new Thread("0");
        Thread threadObj2=new Thread("0");
        s.add(threadObj1);
        s.add(threadObj2);
        System.out.println("Set containing Thread objects :"+s);
    }
}
==========
//Output:
a.hashCode():45
b.hashCode():45
c.hashCode():45
d.hashCode():45
Map contents: {Person@2d=John}   //Map shows only one entry.
Trying to get object "d" : John  //Value is printed, even though we have not copied this value to Map.
Set containing Person objects :[Person@2d]   //Set shows only one entry.
Does Set contains Persond object "d": true
Set containing Integer objects :[0]
Set containing String objects :[0]
Set containing StringBuffer objects :[0, 0]
Set containing Thread objects :[Thread[0,5,main], Thread[0,5,main]]


******************
However if you change the equals() method like below:
    @Override
    public boolean equals(Object obj) {
        return super.equals(obj);
        //return true;
    }

//Output:
a.hashCode():45   //Our defined hashCode
b.hashCode():45
c.hashCode():45
d.hashCode():45
Map contents: {Person@2d=John, Person@2d=Deepak, Person@2d=John} //Map shows 3 entries
Trying to get object "d" : null   //Null is printed.
Set containing Person objects :[Person@2d, Person@2d, Person@2d] //Set shows 3 entries
Does Set contains Persond object "d": false
Set containing Integer objects :[0]
Set containing String objects :[0]
Set containing StringBuffer objects :[0, 0]
Set containing Thread objects :[Thread[0,5,main], Thread[0,5,main]]

******************
Also if you change the equals() and hashCode() method like below:
    @Override
    public boolean equals(Object obj) {
        return super.equals(obj);
        //return true;
    }
    @Override
    public int hashCode() {
        return name.hashCode();
    }
//Output:
a.hashCode():2314539   //System defined hashCode.
b.hashCode():2043177526
c.hashCode():2314539
d.hashCode():2314539
Map contents: {Person@79c86a36=Deepak, Person@23512b=John, Person@23512b=John}  //Map shows 3 entries
Trying to get object "d" : null   //Null is printed.
Set containing Person objects :[Person@79c86a36, Person@23512b, Person@23512b]  //Set shows 3 entries
Does Set contains Persond object "d": false
Set containing Integer objects :[0]
Set containing String objects :[0]
Set containing StringBuffer objects :[0, 0]
Set containing Thread objects :[Thread[0,5,main], Thread[0,5,main]]

******************
Also if you change the equals() and hashCode() method like below:
    @Override
    public boolean equals(Object obj) {        
        return true;
    }
    @Override
    public int hashCode() {
        return name.hashCode();
    }    

//Output:
a.hashCode():2314539
b.hashCode():2043177526
c.hashCode():2314539
d.hashCode():2314539
Map contents: {Person@79c86a36=Deepak, Person@23512b=John}   //Map shows 2 entries with different name.
Trying to get object "d" : John        //Value is printed, even though we have not copied this value to Map.
Set containing Person objects :[Person@79c86a36, Person@23512b] //Set shows 3 entries
Does Set contains Persond object "d": true
Set containing Integer objects :[0]
Set containing String objects :[0]
Set containing StringBuffer objects :[0, 0]
Set containing Thread objects :[Thread[0,5,main], Thread[0,5,main]]

======================END=================

10 comments:

  1. Copy paste with example program :-)

    ReplyDelete
    Replies
    1. Anonymous,
      Try to give your name when writing. The theme and idea of content may be taken from sites but edited heavily. It is completely re-structured and all programs added are tested by me. By the way thanks.

      Delete
  2. Very Helpfull Notes...Thanks for sharing !

    ReplyDelete
  3. Very well explained .. Thanks for sharing

    ReplyDelete
  4. Thank you providing excellent explanation on hash map and hash table.

    ReplyDelete
  5. its exact copy paste ..

    ReplyDelete
    Replies
    1. Are baba, I already told some portions may be copied.. am I trying to get money from this site? Comm'n Anonymous grow up..

      Delete