本文共 9593 字,大约阅读时间需要 31 分钟。
本文是基于JDK7对HashMap进行总结。通过阅读源码,对HashMap的实现原理,数据结构,方法等进行理解和掌握。
在读源码之前,我们先看注释,通过注释可以对HashMap有初步的了解。
/** * Hash table based implementation of the Map interface. This * implementation provides all of the optional map operations, and permits * null values and the null key. (The HashMap * class is roughly equivalent to Hashtable, except that it is * unsynchronized and permits nulls.) This class makes no guarantees as to * the order of the map; in particular, it does not guarantee that the order * will remain constant over time. * 通过这段注释,我们了解到HashMap是允许key为null,value为null的。 * 与HashTable十分相似但是不同的是,HashMap不是线程安全的,而且允许空值null。
*An instance of HashMap has two parameters that affect its * performance: initial capacity and load factor. The * capacity is the number of buckets in the hash table, and the initial * capacity is simply the capacity at the time the hash table is created. The * load factor is a measure of how full the hash table is allowed to * get before its capacity is automatically increased. When the number of * entries in the hash table exceeds the product of the load factor and the * current capacity, the hash table is rehashed (that is, internal data * structures are rebuilt) so that the hash table has approximately twice the * number of buckets. * 通过这段注释,我们了解到HashMapd有两个重要的因素,initial capacity初始容量,load factor负载因子。 * 初始容量是数组的大小,当容量超过阈值(即负载因子和初始容量的乘积the product of the load factor and the current capacity)时,HashMap会自动扩容。
*The iterators returned by all of this class's "collection view methods" * are fail-fast: if the map is structurally modified at any time after * the iterator is created, in any way except through the iterator's own * remove method, the iterator will throw a * {@link ConcurrentModificationException}. Thus, in the face of concurrent * modification, the iterator fails quickly and cleanly, rather than risking * arbitrary, non-deterministic behavior at an undetermined time in the * future. * *
Note that the fail-fast behavior of an iterator cannot be guaranteed * as it is, generally speaking, impossible to make any hard guarantees in the * presence of unsynchronized concurrent modification. Fail-fast iterators * throw ConcurrentModificationException on a best-effort basis. * Therefore, it would be wrong to write a program that depended on this * exception for its correctness: the fail-fast behavior of iterators * should be used only to detect bugs. * 通过这两段注释,我们了解到HashMap有fail-fast(快速-失败机制)。 * 当iterator创建后,若除了iterator自己的remove()方法外,有其他方法改变了HashMap的底层结构(数组的大小发生变化),就会抛出ConcurrentModificationException。 * 在一般写程序时,我们不应该依赖使用fail-fast,通常仅在检查bugs(detect bugs)时使用。
HashMap的底层是数组,数组的每一项是链表,initial capacity就是数组的大小
HashMap继承自AbstractMap抽象类,实现了Map接口。其中Map接口定义了键值映射规则,AbstractMap抽象类提供了Map接口的骨干实现,以最大程度的减少了Map接口所需的工作
public class HashMapextends AbstractMap implements Map , Cloneable, Serializable{
在看构造函数之前,我们先看HashMap类的一些成员变量
static final int DEFAULT_INITIAL_CAPACITY = 16;//默认的初始容量为16 static final int MAXIMUM_CAPACITY = 1 << 30;//最大容量为2的30次方 static final float DEFAULT_LOAD_FACTOR = 0.75f;//默认的负载因子为0.75 transient Entry[] table;//底层Entry类型的数组 int threshold;//阈值 当容量大于等于阈值时,HashMap会自动扩容,变为原来的2倍 transient int modCount;//用来记录结构性改变的次数通过指定initialCapacity和loadFactor的值来构造HashMap
public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); // Find a power of 2 >= initialCapacity int capacity = 1; while (capacity < initialCapacity)//通过此处我们可以看到,底层数组的大小必须为2的幂次方 capacity <<= 1; this.loadFactor = loadFactor; threshold = (int)(capacity * loadFactor); table = new Entry[capacity]; init(); }
缺省构造函数
public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR;//使用默认值 负载因子为0.75 threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR); table = new Entry[DEFAULT_INITIAL_CAPACITY];//初始容量为16 init(); }
指定一个初始容量构造函数
public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR);//使用指定容量和默认的负载因子 }
传入一个Map为参数构造HashMap,构造一个与Map具有相同映射的HashMap
public HashMap(Map m) { this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1, DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);//初始容量不小于16 putAllForCreate(m); }
这里我们可以看到初始容量和负载因子是两个非常重要的参数,容量表示哈希表中桶的数量(即数组的大小);负载因子表示的是表容量可以达到多满的一种尺度,它衡量的是一个散列表的空间使用程度,值越大,代表装填程度越高
为什么负载因子默认值为0.75?
HashMap底层使用拉链法解决链表冲突的,使用拉链法的哈希表查找一个元素的平均事件为O(1+a),a指的是链表的长度,是一个常数。若负载因在越大,对空间利用更充分,但查找效率会降低;若负载因子越小,哈希表的数据会越稀疏,会造成严重的空间浪费。默认是0.75,是时间和空间成本上的一个折中。 在介绍其他方法前,我们先看下HashMap是如何定位到数组的某一位置//将(key的hash值h)与(数组长度-1)按位与static int indexFor(int h, int length) { return h & (length-1); }
public V get(Object key) { if (key == null)//先判断key是否为null,为null时,调用getForNulKey()方法 return getForNullKey(); int hash = hash(key.hashCode());//根据key的哈希值再哈希 也就是说key经过两次哈希计算 for (Entrye = table[indexFor(hash, table.length)];//根据key两次的hash值得到索引,确定在哪个桶 e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k)))//在链表中找到key对应的value值,hash值相等并不能确定是否为同一对象,必须判断key是否相等 return e.value; } return null;//找不到对应的value 返回null } * A return value of {@code null} does not necessarily * indicate that the map contains no mapping for the key; it's also * possible that the map explicitly maps the key to {@code null}. * The {@link #containsKey containsKey} operation may be used to * distinguish these two cases. * 通过这段注释,我们了解到当get方法返回值为null,不一定是没有key对应的key-value键值对,也可能value本身就为null(HashMap是允许key和value都为null的)。 * 这时,必须通过containsKey(Object key)方法判断key是否存在,若存在,则表明value的为null;若不存在,就表明HashMap中没有此key—value映射关系
getForNullKey()方法
//key为null时,对应的数组坐标(index)为0 private V getForNullKey() { for (Entrye = table[0]; e != null; e = e.next) { if (e.key == null) return e.value; } return null; }
public V put(K key, V value) { if (key == null)//判断key是否为null,是 调用putForNullKey(value)方法 return putForNullKey(value); int hash = hash(key.hashCode()); int i = indexFor(hash, table.length);//确定在桶中的位置 for (Entrye = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value;//若存在key—value映射关系,则用新值覆盖旧值,并返回旧值oldValue e.value = value; e.recordAccess(this); return oldValue; } } modCount++;//每调用put方法,modCount值加1,put方法可以造成结构性改变 addEntry(hash, key, value, i);//不存在key-value映射关系,则创建entry添加进数组 return null; }
putForNullKey(value)方法
//当key为null时,调用此方法,可以看出对应的坐标index为0 private V putForNullKey(V value) { for (Entrye = table[0]; e != null; e = e.next) { if (e.key == null) { //若table[0]已经存在 判断key是否为null,是 则用新值覆盖旧值。通过这里,我们可以看出HashMap只允许一个Key为null的key-value映射关系存在 V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(0, null, value, 0); return null; }
addEntry(int hash, K key, V value, int bucketIndex)方法
void addEntry(int hash, K key, V value, int bucketIndex) { Entrye = table[bucketIndex]; table[bucketIndex] = new Entry<>(hash, key, value, e);//链表的插入使用的是头插法 if (size++ >= threshold)//这里我们可以看见,HashMap是在已经添加进去之后再判容量是否超过阈值,是否扩容。 //这点和ConcurrentHashMap不同,后者时在添加之前判断的 resize(2 * table.length);//扩容时,变为原来容量的2倍 }
resize()方法
void resize(int newCapacity) { Entry[] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity == MAXIMUM_CAPACITY) { //扩容时,若原容量已经达到最大允许值,则将阈值设置为MAX_VALUE,不再扩容 threshold = Integer.MAX_VALUE; return; } Entry[] newTable = new Entry[newCapacity];//扩容时,可以看见是重新创建了一个新的数组,然后调用transfer(newTable)方法将值转移到新数组 transfer(newTable); table = newTable; threshold = (int)(newCapacity * loadFactor); }
transfer()方法
void transfer(Entry[] newTable) { Entry[] src = table; int newCapacity = newTable.length; for (int j = 0; j < src.length; j++) { Entrye = src[j]; if (e != null) { src[j] = null; do { Entry next = e.next; int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; } while (e != null); } } } 通过本方法可以看见,在转移值时是通过复制的方法进行的,这样来HashMap的扩容将对性能造成极大影响,所以创建HashMap时能准确指定初始容量,将极大提高性能。 在转移的过程中,原属于同一个桶的entry对象可能被分到不同的桶,原因是桶的容量发生变化,那么h&(length-1)的值也会发生响应的变化。