Dear reader,
I have mentioned below topics in this blog article:
1) How HashMap works in Java.
2) Difference between HashMap and HashTable.
3) At the end few programs having implementation of equals() and hashCode() method
and impact of these methods on HashMap storage.
How HashMap works in Java is a very common question. Almost everybody who worked
in Java knows what hashMap is, where to use hashMap or difference between
hashTable and HashMap, then why this interview question becomes so special? Because of the breadth and depth
this question offers. It has become very popular java interview question in almost any senior or mid-senior
level java interviews.
Questions start with simple statement:
"Have you used HashMap before" or "What is HashMap? Why do we use it“
Almost everybody answers this with yes and then interviewee keep talking about common facts about HashMap like
HashMap accept null while hashTable doesn't, HashMap is not synchronized, hashMap is fast and so on along with
basics like its stores key and value pairs etc.
This shows that person has used HashMap and quite familiar with the functionality HashMap offers but interview
takes a sharp turn from here and next set of follow up questions gets more detailed about fundamentals involved
in HashMap. Interview here you and come back with questions like
"Do you Know how HashMap works in Java” or "How does get () method of HashMap works in Java"
And then you get answers like I don't bother its standard Java API, you better look code on java; I can find
it out in Google at any time etc.
But some interviewee definitely answer this and will say "HashMap works on principle of hashing, we have put()
and get() method for storing and retrieving data from HashMap. When we pass an object to put () method to store
it on HashMap, HashMap implementation calls hashcode() method HashMap key object and by applying that hashcode on
its own hashing function it identifies a bucket location for storing value object , important part here is HashMap
stores both key+value in bucket which is essential
to understand the retrieving logic. if people fails to recognize this and say it only stores Value in the bucket they
will fail to explain the retrieving logic of any object stored in HashMap . This answer is very much acceptable and
does make sense that interviewee has fair bit of knowledge how hashing works and how HashMap works in Java.
But this is just start of story and going forward when depth increases a little bit and when you put interviewee on
scenarios every java developers faced day by day basis. So next question would be more likely about collision
detection and collision resolution in Java HashMap e.g
"What will happen if two different objects have same hashCode?”
Now from here confusion starts some time interviewer will say that since hashCode is equal objects are equal and
HashMap will throw exception or not store it again etc. then you might want to remind them about equals and hashCode()
contract that two unequal object in Java very much can have equal hashCode. Some will give up at this point and some
will move ahead and say "Since hashCode () is same, bucket location would be same and collision occurs in HashMap,
Since HashMap uses a linked list to store in bucket, value object will be stored in next node of linked list."
Great this answer make sense to me though there could be some other collision resolution methods available this is
simplest and HashMap does follow this. See the program at the end of this article.
But story does not end here and final questions interviewer ask like:
"How will you retrieve if two different objects have same hashcode?”
Hmmmmmmmmmmm
Interviewee will say we will call get() method and then HashMap uses keys hashCode to find out bucket location and
retrieves object but then you need to remind him that there are two objects are stored in same bucket , so they will
say about traversal in linked list until we find the value object , then you ask how do you identify value object
because you don't value object to compare ,So until they know that HashMap stores both Key and Value in linked list
node they won't be able to resolve this issue and will try and fail.
But those bunch of people who remember this key information will say that after finding bucket location , we will call
keys.equals() method to identify correct node in linked list and return associated value object for that key in Java
HashMap. Perfect this is the correct answer.
In many cases interviewee fails at this stage because they get confused between hashcode() and equals() and keys and
values object in hashMap which is pretty obvious because they are dealing with the hashcode() in all previous questions
and equals() come in picture only in case of retrieving value object from HashMap.
Some good developer point out here that using immutable, final object with proper equals() and hashcode()
implementation would act as perfect Java HashMap keys and improve performance of Java hashMap by reducing collision.
Immutability also allows caching there hashcode of different keys which makes overall retrieval process very fast and
suggest that String and various wrapper classes e.g Integer provided by Java Collection API are very good HashMap keys.
Now if you clear all this java hashmap interview question you will be surprised by this very interesting question
"What happens On HashMap in Java if the size of the Hashmap exceeds a given threshold defined by load factor ?".
Until you know how hashmap works exactly you won't be able to answer this question.
if the size of the map exceeds a given threshold defined by load-factor e.g. if load factor is .75 it will act to
re-size the map once it filled 75%. Java Hashmap does that by creating another new bucket array of size twice of
previous size of hashmap, and then start putting every old element into that new bucket array and this process is
called rehashing because it also applies hash function to find new bucket location.
If you manage to answer this question on hashmap in java you will be greeted by "do you see any problem with resizing
of hashmap in Java", you might not be able to pick the context and then he will try to give you hint about multiple
thread accessing the java hashmap and potentially looking for race condition on HashMap in Java.
So the answer is Yes there is potential race condition exists while resizing hashmap in Java, if two thread at the same
time found that now Java Hashmap needs resizing and they both try to resizing. on the process of resizing of hashmap in
Java, the element in bucket which is stored in linked list get reversed in order during there migration to new bucket
because java hashmap doesn't append the new element at tail instead it append new element at head to avoid tail traversing.
If race condition happens then you will end up with an infinite loop. though this point you can potentially argue that
what the hell makes you think to use HashMap in multi-threaded environment to interviewer :).
Never use it in multithreaded env. For MultiThreaded use ConcurrentHashMap.
I like this question because of its depth and number of concept it touches indirectly, if you look at questions asked
during interview this HashMap questions has verified:
Concept of hashing
Collision resolution in HashMap
Use of equals () and hashCode () method and there importance?
Benefit of immutable object?
race condition on hashmap in Java
Resizing of Java HashMap
Just to summarize here are the answers which does makes sense for above questions:
How HashMAp works in Java
HashMap works on principle of hashing, we have put() and get() method for storing and retrieving object form hashMap.
When we pass an both key and value to put() method to store on HashMap, it uses key object hashcode() method to calculate
hashcode and they by applying hashing on that hashcode it identifies bucket location for storing value object.
While retrieving it uses key object's equals method to find out correct key value pair and return value object associated
with that key. HashMap uses linked list in case of collision and object will be stored in next node of linked list.
Also hashMap stores both key+value tuple in every node of linked list.
What will happen if two different HashMap key objects have same hashcode?
They will be stored in same bucket but no next node of linked list. And keys equals() method will be used to identify
correct key value pair in HashMap.
In terms of usage HashMap is very versatile and can be mostly used hashMap as cache in electronic trading application.
Since finance domain used Java heavily and due to performance reason we need caching a lot, HashMap comes very handy there.
For new insertions, If key is same i.e. equals by equals method in Java than hashcode would be same and value will be
replaced but if key is not same i.e. not equal but hashcode is same (Which is possible in java) than bucked location
would be same and collision would happen and second object will be stored on second node of linked list structure
on Map.
So key.equals() is used on both put() and get() method call if object already exits in bucked location on hashmap
and that's why tuples contains both key and value in each node. In Java Object helpfully provides hashCode() and equals()
so we know that any object will be usable as a key in a hashtable.
So the known risks when using HashMap data structure in multithreaded env are: HashMap internal index corruption
and infinite looping, which can bring your JVM to its knees.
What JDK Map data structure is more suitable to handle concurrent read & write operations in a Java EE environment?
The answer is to use the ConcurrentHashMap, introduced in JDK 1.5, which provides Thread safe operations but no
blocking get() operation; which is key for proper performance. This is the typical data structure used these
days in modern Java EE container implementations.
JDK 1.5 introduce some good concurrent collections which is highly efficient for high volume, low latency system.
The synchronized collections classes, Hashtable and Vector, and the synchronized wrapper classes,
Collections.synchronizedMap and Collections.synchronizedList, provide a basic conditionally thread-safe implementation
of Map and List.
However, several factors make them unsuitable for use in highly concurrent applications -- their single collection-wide
lock is an impediment to scalability and it often becomes necessary to lock a collection for a considerable time during
iteration to prevent ConcurrentModificationExceptions.
The ConcurrentHashMap and CopyOnWriteArrayList implementations provide much higher concurrency while preserving thread
safety, with some minor compromises in their promises to callers. ConcurrentHashMap and CopyOnWriteArrayList are not
necessarily useful everywhere you might use HashMap or ArrayList, but are designed to optimize specific common situations.
Many concurrent applications will benefit from their use.
So what is the difference between hashtable and ConcurrentHashMap , both can be used in multithreaded environment but
once the size of hashtable becomes considerable large performance degrade because for iteration it has to be locked
for longer duration.
Since ConcurrentHashMap indroduced concept of segmentation , how large it becomes only certain part of it get locked
to provide thread safety so many other readers can still access map without waiting for iteration to complete.
In Summary ConcurrentHashMap only locked certain portion of Map while Hashtable lock full map while doing iteration.
HashMap can be synchronized by:
Map m = Collections.synchronizeMap(hashMap);
Difference between HashMap and HashTable?
1. The HashMap class is roughly equivalent to Hashtable, except that it is non synchronized and permits nulls.
(HashMap allows null values as key and value whereas Hashtable doesn't allow nulls).
2. HashMap does not guarantee that the order of the map will remain constant over time.
3. HashMap is non synchronized whereas Hashtable is synchronized.
4. Iterator in the HashMap is fail-fast while the enumerator for the Hashtable is not and throw
ConcurrentModificationException if any other Thread modifies the map structurally by adding or removing any
element except Iterator's own remove() method. But this is not a guaranteed behavior and will be done by
JVM on best effort.
5) HashTable extends Dictionary interface which is quite old while hashmap extends Map interface.
6) hashtalbe doesn't have counterpart like ConcurrentHashMap.
Note on Some Important Terms:
1)Synchronized means only one thread can modify a hashtable at one point of time. Basically, it means that
any thread before performing an update on a hashtable will have to acquire a lock on the object while others
will wait for lock to be released.
2)Fail-safe is relevant from the context of iterators. If an iterator has been created on a collection object
and some other thread tries to modify the collection object "structurally", a concurrent modification exception
will be thrown. It is possible for other threads though to invoke "set" method since it doesn't modify the
collection "structurally". However, if prior to calling "set", the collection has been modified structurally,
"IllegalArgumentException" will be thrown.
3)Structurally modification means deleting or inserting element which could effectively change the structure
of map.
=================Other important points about use of Map============
1) If the Class's object which is getting stored in Map, overrites equals() and hashCode() method and
hashCode() returns a fixed value like "45" or "10" and equals() return "true" for all cases as explained
in below coding then Map objects will be considered as Duplicate, so only one Pair will be shown as output.
Check the MainMap.java program below:
//Person.java (overriding equals() and hashCode() method
public final class Person {
final String name;
public Person(String name){
this.name=name;
}
@Override
public boolean equals(Object obj) {
//return super.equals(obj);
return true;
}
@Override
public int hashCode() {
return 45;
}
}
===========
//MainMap.java
import java.util.HashMap;
import java.util.HashSet;
import java.util.Set;
public class MainMap {
public static void main(String[] args) {
Person a=new Person("John");
Person b=new Person("Deepak");
Person c=new Person("John");
Person d=new Person("John");
System.out.println("a.hashCode():"+a.hashCode());
System.out.println("b.hashCode():"+b.hashCode());
System.out.println("c.hashCode():"+c.hashCode());
System.out.println("d.hashCode():"+d.hashCode());
HashMap map=new HashMap();
map.put(a,"John");
map.put(b, "Deepak");
map.put(c,"John");
System.out.println("Map contents: "+map);
System.out.println("Trying to get object \"d\" : "+map.get(d));
Set s=new HashSet();
s.add(a);
s.add(b);
s.add(c);
System.out.println("Set containing Person objects :"+s);
//System.out.println(s.get(d)); //get method is not allowed in Set.
System.out.println("Does Set contains Person d object \"d\": "+s.contains(d)); //false
s.clear();
Integer intObj1=new Integer("0");
Integer intObj2=new Integer("0");
s.add(intObj1);
s.add(intObj2);
System.out.println("Set containing Integer objects :"+s);
s.clear();
String stringObj1=new String("0");
String stringObj2=new String("0");
s.add(stringObj1);
s.add(stringObj2);
System.out.println("Set containing String objects :"+s);
s.clear();
StringBuffer stringBuffObj1=new StringBuffer("0");
StringBuffer stringBuffObj2=new StringBuffer("0");
s.add(stringBuffObj1);
s.add(stringBuffObj2);
System.out.println("Set containing StringBuffer objects :"+s);
s.clear();
Thread threadObj1=new Thread("0");
Thread threadObj2=new Thread("0");
s.add(threadObj1);
s.add(threadObj2);
System.out.println("Set containing Thread objects :"+s);
}
}
==========
//Output:
a.hashCode():45
b.hashCode():45
c.hashCode():45
d.hashCode():45
Map contents: {Person@2d=John} //Map shows only one entry.
Trying to get object "d" : John //Value is printed, even though we have not copied this value to Map.
Set containing Person objects :[Person@2d] //Set shows only one entry.
Does Set contains Persond object "d": true
Set containing Integer objects :[0]
Set containing String objects :[0]
Set containing StringBuffer objects :[0, 0]
Set containing Thread objects :[Thread[0,5,main], Thread[0,5,main]]
******************
However if you change the equals() method like below:
@Override
public boolean equals(Object obj) {
return super.equals(obj);
//return true;
}
//Output:
a.hashCode():45 //Our defined hashCode
b.hashCode():45
c.hashCode():45
d.hashCode():45
Map contents: {Person@2d=John, Person@2d=Deepak, Person@2d=John} //Map shows 3 entries
Trying to get object "d" : null //Null is printed.
Set containing Person objects :[Person@2d, Person@2d, Person@2d] //Set shows 3 entries
Does Set contains Persond object "d": false
Set containing Integer objects :[0]
Set containing String objects :[0]
Set containing StringBuffer objects :[0, 0]
Set containing Thread objects :[Thread[0,5,main], Thread[0,5,main]]
******************
Also if you change the equals() and hashCode() method like below:
@Override
public boolean equals(Object obj) {
return super.equals(obj);
//return true;
}
@Override
public int hashCode() {
return name.hashCode();
}
//Output:
a.hashCode():2314539 //System defined hashCode.
b.hashCode():2043177526
c.hashCode():2314539
d.hashCode():2314539
Map contents: {Person@79c86a36=Deepak, Person@23512b=John, Person@23512b=John} //Map shows 3 entries
Trying to get object "d" : null //Null is printed.
Set containing Person objects :[Person@79c86a36, Person@23512b, Person@23512b] //Set shows 3 entries
Does Set contains Persond object "d": false
Set containing Integer objects :[0]
Set containing String objects :[0]
Set containing StringBuffer objects :[0, 0]
Set containing Thread objects :[Thread[0,5,main], Thread[0,5,main]]
******************
Also if you change the equals() and hashCode() method like below:
@Override
public boolean equals(Object obj) {
return true;
}
@Override
public int hashCode() {
return name.hashCode();
}
//Output:
a.hashCode():2314539
b.hashCode():2043177526
c.hashCode():2314539
d.hashCode():2314539
Map contents: {Person@79c86a36=Deepak, Person@23512b=John} //Map shows 2 entries with different name.
Trying to get object "d" : John //Value is printed, even though we have not copied this value to Map.
Set containing Person objects :[Person@79c86a36, Person@23512b] //Set shows 3 entries
Does Set contains Persond object "d": true
Set containing Integer objects :[0]
Set containing String objects :[0]
Set containing StringBuffer objects :[0, 0]
Set containing Thread objects :[Thread[0,5,main], Thread[0,5,main]]
======================END=================
Wednesday, March 7, 2012
How HashMap works in Java
Subscribe to:
Post Comments (Atom)
very good post!!
ReplyDeletenice one
ReplyDeletethanks for sharing
Copy paste with example program :-)
ReplyDeleteAnonymous,
DeleteTry to give your name when writing. The theme and idea of content may be taken from sites but edited heavily. It is completely re-structured and all programs added are tested by me. By the way thanks.
Very Helpfull Notes...Thanks for sharing !
ReplyDeleteVery well explained .. Thanks for sharing
ReplyDeletenice
ReplyDeleteThank you providing excellent explanation on hash map and hash table.
ReplyDeleteits exact copy paste ..
ReplyDeleteAre baba, I already told some portions may be copied.. am I trying to get money from this site? Comm'n Anonymous grow up..
Delete