HashMap多线程不安全原因分析

时间：01-31来源：作者：点击数：59

众所周知，HashMap在多线程环境下是线程不安全的，

在jdk1.7中，主要有两个方面线程不安全，一是多线程扩容因为头插法容易造成死循环。二是put的时候容易造成数据覆盖。

在jdk1.8中，使用尾插法避免了resize时死循环，但是put的时候，多线程环境下仍然会出现数据覆盖的问题。

接下来逐个分析问题点：

jdk1.7中扩容死循环的问题

HashMap在jdk1.7扩容时在多线程环境下会发生死循环问题，研究一下多线程环境下为什么会发生死循环，首先看一下扩容代码，扩容代码发生在put方法中：

public V put(K key, V value) {
　　　　if (table == EMPTY_TABLE) {
　　　　　　inflateTable(threshold);
　　　　}
　　　　
　　　　if (key == null)
　　　　　　return putForNullKey(value);
　　　　　　　　
　　　　int hash = hash(key);
　　　　int i = indexFor(hash, table.length);
　　　　for (Entry<K,V> e = table[i]; e != null; e = e.next) {
　　　　　　Object k;
　　　　　　if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
　　　　　　　　V oldValue = e.value;
　　　　　　　　e.value = value;
　　　　　　　　e.recordAccess(this);
　　　　　　　　return oldValue;
　　　　　　}
　　　　}
 
　　　　modCount++;
　　　　//存值
　　　　addEntry(hash, key, value, i);
　　　　return null;
　　}

void addEntry(int hash, K key, V value, int bucketIndex) {
　　　　if ((size >= threshold) && (null != table[bucketIndex])) {
　　　　　　//扩容，将原来数组重新计算hash值，并且放到新的位置中
　　　　　　resize(2 * table.length);
　　　　　　hash = (null != key) ? hash(key) : 0;
　　　　　　bucketIndex = indexFor(hash, table.length);
　　　　}
 
　　　　createEntry(hash, key, value, bucketIndex);
}


void resize(int newCapacity) {
　　　　Entry[] oldTable = table;
　　　　int oldCapacity = oldTable.length;
　　　　if (oldCapacity == MAXIMUM_CAPACITY) {
　　　　　　threshold = Integer.MAX_VALUE;
　　　　　　return;
　　　　}
 
　　　　Entry[] newTable = new Entry[newCapacity];
        //将原来数组的值放入新数组中
　　　　transfer(newTable, initHashSeedAsNeeded(newCapacity));
　　　　
　　　　table = newTable;
　　　　threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

发生死循环在transfer()方法中

void transfer(Entry[] newTable, boolean rehash) {
　　　　int newCapacity = newTable.length;
　　　　for (Entry<K,V> e : table) {
　　　　　　while(null != e) {
                //1.将链表的下个节点放入新的对象next中
　　　　　　　　Entry<K,V> next = e.next;
                //2.重新计算hash值
　　　　　　　　if (rehash) {
　　　　　　　　　　e.hash = null == e.key ? 0 : hash(e.key);
　　　　　　　　}
　　　　　　　　 //3.重新计算当前e对象的hash值，并计算存储位置
　　　　　　　　int i = indexFor(e.hash, newCapacity);
               //4.将新数组的i的位置的元素放到当前元素的后面，头插法
　　　　　　　　e.next = newTable[i];
               //5.将新数组的i位置设置为当前e
　　　　　　　　newTable[i] = e;
               //6.继续遍历next对象
　　　　　　　　e = next;
　　　　　　}
　　　　}
　　}

单线程环境下是这样扩容的：

(1) 原来的数组，假如2,4,6的hash值相同，都在old这个位置的链表上。

（2）现在新来一个元素（这个元素放在其他位置，这里就不画了），导致当前hashmap对象需要扩容。首先移动2

（3）移动4

(4) 移动6

使用头插法，将元素依次移动到新的链表位置。

在多线程环境下，发生多线程死循环的位置是当某个线程还未扩容完，另外一个线程竞争到锁，执行扩容，之后原来线程继续执行扩容时，对象的指针发生改变，导致后面的死循环。这里举一个例子，两个线程A和B，对hashmap对象进行扩容，发生死循环的过程

(1) 仍然使用上面的例子

（2）线程A竞争到锁，进行扩容

(2)当线程A执行到Entry<K,V> next = e.next；这步时，线程B竞争到了HashMap对象的锁，完成扩容。此时2已经移动到了i位置上

（3）线程A竞争到锁，继续执行没完成的任务，这里涉及到JVM的知识，我们再看一下transfer代码：

void transfer(Entry[] newTable, boolean rehash) {
　　　　int newCapacity = newTable.length;
　　　　for (Entry<K,V> e : table) {
　　　　　　while(null != e) {
                //1.将链表的下个节点放入新的对象next中，next为栈帧中的局部变量
　　　　　　　　Entry<K,V> next = e.next;
                //2.重新计算hash值
　　　　　　　　if (rehash) {
　　　　　　　　　　e.hash = null == e.key ? 0 : hash(e.key);
　　　　　　　　}
　　　　　　　　 //3.重新计算当前e对象的hash值，并计算存储位置
　　　　　　　　int i = indexFor(e.hash, newCapacity);
               //4.将新数组的i的位置的元素放到当前元素的后面，头插法
　　　　　　　　e.next = newTable[i];
               //5.将新数组的i位置设置为当前e
　　　　　　　　newTable[i] = e;
               //6.继续遍历next对象
　　　　　　　　e = next;
　　　　　　}
　　　　}
　　}

有两个重点需要了解一下：

JVM虚拟机中，包含虚拟机栈，为私有，栈中包含栈帧，每增一个栈帧都代表执行一次方法。堆中包含对象，是公共的，栈帧中局部变量的指针指向堆中对象。因此在线程A中，已经执行过

Entry<K,V> next = e.next,next是局部变量，为线程A私有，因此next值不变，为4。

transfer方法的for循环遍历，每个e变量都是指向table的某个位置，当前e的指针已经被线程B改变，改成了指向4的位置。因此e等于4

void transfer(Entry[] newTable, boolean rehash) {
　　　　···
　　　　for (Entry<K,V> e : table) {
        
　　　　　　···
　　　　}
        //上面的for循环等价于
        for(int j = 0; j < table.size(); j++){
            Entry<K,V> e = table[j];//e指向table[j]的某个位置
        }
　　}

接着说线程A的扩容，由于next和e都指向4，因此在while循环中，e的位置已经由old变为了i位置

（4）后面无论是线程A还是线程B扩容，e的指针都已经发生改变，执行这段代码的时候，next永远不为空，会循环放2和4，产生死循环

e.next = newTable[i];
//5.将新数组的i位置设置为当前e
newTable[i] = e;

（5）发生死循环

HashMap在jdk1.7中多线程会发生死循环的原因分析完成了。jdk1.8中HashMap去除了transfer方法，优化了链表插入方式，改为尾插法，避免了死循环。

put值覆盖的问题

jdk1.8中，HashMap在扩容的时候使用尾插法避免了死循环问题，但是put值覆盖的问题没有解决，接下来分析一下put为什么会造成值覆盖的问题。

原因resize扩容一样，当线程A执行完"if ((p = tab[i = (n - 1) & hash]) == null)"之后被线程B抢占资源继续执行，比如线程A定义的i=10，之后线程B执行完之后，i=10的位置已经有数据了，那么线程A继续执行的时候会将i=10的位置的数据覆盖，造成不安全的问题。接下来用代码演示一下put方法值覆盖的现象：

代码：

public class HashMapThreadDemo {
    static HashMap<Integer, String> map = new HashMap<>();//存值
    static ThreadLocal<Integer> count = new ThreadLocal<>();//计数器
    static CountDownLatch countDownLatch = new CountDownLatch(2);//等待两个线程执行完主线程再继续执行
    static Integer i = 0;//作为key自增

    public static void main(String[] args) throws InterruptedException {
        Thread t1 = new Thread(new HashMapThread(), "线程A");
        Thread t2 = new Thread(new HashMapThread(), "线程B");
        t1.start();
        t2.start();
        countDownLatch.await();
        System.out.println("执行完成!map:" + map);
    }

    static class HashMapThread extends Thread {
        @Override
        public void run() {
            if (count.get() == null) {
                count.set(0);
            }
            //本地运行50次
            while (count.get() < 50) {
                String value = Thread.currentThread().getName() + "_" + i;
                map.put(i, value);
                System.out.println(Thread.currentThread().getName() + "  key:" + i + "  value:" + value + "  次数count:" + count.get());
                count.set(count.get() + 1);
                i++;
            }
            countDownLatch.countDown();
        }
    }
}