Unexpected collision with std::hash(与 std::hash 意外冲突)
问题描述
我知道将无限数量的字符串散列到 32b int 必须产生冲突,但我希望散列函数有一些不错的分布.
I know hashing infinite number of string into 32b int must generate collision, but I expect from hashing function some nice distribution.
这两个字符串有相同的哈希值是不是很奇怪?
Isn't it weird that these 2 strings have the same hash?
size_t hash0 = std::hash<std::string>()("generated_id_0");
size_t hash1 = std::hash<std::string>()("generated_id_1");
//hash0 == hash1
我知道我可以使用 boost::hash<std::string>
或其他,但我想知道 std::hash
有什么问题.我用错了吗?我不应该以某种方式播种"它吗?
I know I can use boost::hash<std::string>
or others, but I want to know what is wrong with std::hash
. Am I using it wrong? Shouldn't I somehow "seed" it?
推荐答案
您对 std::hash
的使用没有任何问题.问题在于,与 Visual Studio 2010 捆绑的标准库实现提供的专门化 std::hash<std::string>
仅采用字符串字符的子集来确定哈希值(大概用于性能原因).巧合的是,具有 14 个字符的字符串的最后一个字符不属于该集合,这就是为什么两个字符串产生相同哈希值的原因.
There's nothing wrong with your usage of std::hash
. The problem is that the specialization std::hash<std::string>
provided by the standard library implementation bundled with Visual Studio 2010 only takes a subset of the string's characters to determine the hash value (presumably for performance reasons). Coincidentally the last character of a string with 14 characters is not part of this set, which is why both strings yield the same hash value.
据我所知,这种行为符合标准,要求仅对具有相同参数的哈希函数的多次调用必须始终返回相同的值.然而,散列冲突的可能性应该是最小的.VS2010 实现实现了必选部分,但没有考虑可选部分.
As far as I know this behaviour is in conformance with the standard, which demands only that multiple calls to the hash function with the same argument must always return the same value. However, the probability of a hash collision should be minimal. The VS2010 implementation fulfills the mandatory part, yet fails to account for the optional one.
有关详细信息,请参阅头文件 xfunctional
(从我的副本中的第 869 行开始)和 C++ 标准的 §17.6.3.4(最新公开草案).
For details, see the implementation in the header file xfunctional
(starting at line 869 in my copy) and §17.6.3.4 of the C++ standard (latest public draft).
如果你绝对需要一个更好的字符串哈希函数,你应该自己实现它.实际上没那么难.
If you absolutely need a better hash function for strings, you should implement it yourself. It's actually not that hard.
这篇关于与 std::hash 意外冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:与 std::hash 意外冲突
基础教程推荐
- 为 C/C++ 中的项目的 makefile 生成依赖项 2022-01-01
- 从 std::cin 读取密码 2021-01-01
- 在 C++ 中循环遍历所有 Lua 全局变量 2021-01-01
- 使用从字符串中提取的参数调用函数 2022-01-01
- 为什么语句不能出现在命名空间范围内? 2021-01-01
- 如何使图像调整大小以在 Qt 中缩放? 2021-01-01
- 如何“在 Finder 中显示"或“在资源管理器中显 2021-01-01
- Windows Media Foundation 录制音频 2021-01-01
- 管理共享内存应该分配多少内存?(助推) 2022-12-07
- 如何在不破坏 vtbl 的情况下做相当于 memset(this, ...) 的操作? 2022-01-01