为什么 HashMap 会重新散列键对象提供的哈希码?

Why does a HashMap rehash the hashcode supplied by the key object?(为什么 HashMap 会重新散列键对象提供的哈希码?)

本文介绍了为什么 HashMap 会重新散列键对象提供的哈希码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读 Java 1.6 API 提供的 HashMap 类的代码,无法完全理解以下操作的需要(在 put 和 get 方法的主体中找到):

I am reading the code of the HashMap class provided by the Java 1.6 API and unable to fully understand the need of the following operation (found in the body of put and get methods):

int hash = hash(key.hashCode());

方法 hash() 的主体如下:

 private static int hash(int h) {
         h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

这通过对提供的哈希码执行位操作来有效地重新计算哈希.即使 API 声明如下,我也无法理解这样做的必要性:

This effectively recalculates the hash by executing bit operations on the supplied hashcode. I'm unable to understand the need to do so even though the API states it as follows:

这很关键因为 HashMap 使用长度为二的幂的哈希表,所以否则会遇到没有不同的 hashCodes 冲突在低位.

This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits.

我确实了解键值 par 存储在数据结构数组中,并且该数组中项目的索引位置由其哈希确定.我不明白的是这个函数如何为哈希分布增加任何价值.

I do understand that the key value pars are stored in an array of data structures, and that the index location of an item in this array is determined by its hash. What I fail to understand is how would this function add any value to the hash distribution.

推荐答案

正如 Helper 所写,它的存在是为了以防关键对象的现有哈希函数有问题,并且不能很好地混合较低的对象位.根据pgras引用的来源,

As Helper wrote, it is there just in case the existing hash function for the key objects is faulty and does not do a good-enough job of mixing the lower bits. According to the source quoted by pgras,

 /**
  * Returns index for hash code h.
  */
 static int indexFor(int h, int length) {
     return h & (length-1);
 }

哈希以 2 的幂长度进行与运算(因此,length-1 保证为 1 的序列).由于此 ANDing,仅使用 h 的低位.h 的其余部分被忽略.想象一下,无论出于何种原因,原始哈希仅返回可被 2 整除的数字.如果直接使用它,则永远不会使用哈希图的奇数位置,从而导致碰撞次数增加 x2.在一个真正病态的情况下,一个糟糕的哈希函数会使哈希图表现得更像一个列表,而不是一个 O(1) 容器.

The hash is being ANDed in with a power-of-two length (therefore, length-1 is guaranteed to be a sequence of 1s). Due to this ANDing, only the lower bits of h are being used. The rest of h is ignored. Imagine that, for whatever reason, the original hash only returns numbers divisible by 2. If you used it directly, the odd-numbered positions of the hashmap would never be used, leading to a x2 increase in the number of collisions. In a truly pathological case, a bad hash function can make a hashmap behave more like a list than like an O(1) container.

Sun 工程师必须运行测试表明,太多哈希函数的低位不够随机,而且许多哈希图不够大,无法使用高位.在这些情况下,HashMap 的 hash(int h) 中的位操作可以提供比大多数预期用例(由于较低的冲突率)的净改进,即使需要额外的计算.

Sun engineers must have run tests that show that too many hash functions are not random enough in their lower bits, and that many hashmaps are not large enough to ever use the higher bits. Under these circumstances, the bit operations in HashMap's hash(int h) can provide a net improvement over most expected use-cases (due to lower collision rates), even though extra computation is required.

这篇关于为什么 HashMap 会重新散列键对象提供的哈希码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:为什么 HashMap 会重新散列键对象提供的哈希码?

基础教程推荐