Every few months a dev claims "I wrote the fastest C++ hash map". Hmm- let's see.
For me, inserts and finds on hashmaps that have been pre-initialized with reserve() are the top things I care about.
Let's check out robin_hood unordered map & set: https://github.com/martinus/robin-hood-hashing
For me, inserts and finds on hashmaps that have been pre-initialized with reserve() are the top things I care about.
Let's check out robin_hood unordered map & set: https://github.com/martinus/robin-hood-hashing
I've only played with it, but it's a high quality looking container, and the author thoroughly benchmarked it against many others. Let's do a simple test with just insert()'s with and without calling reserve().
I've been using my own hash maps on perf critical projects for ~20 years. It uses linear probing and Fibonacci hashing (standard Knuth stuff) and I checked the generated asm in all key methods. Example: https://github.com/BinomialLLC/crunch/blob/master/crnlib/crn_hash_map.h
Time to insert 5,000 random uint64_t/uint64_t key/value pairs:
reserve(5000):
robin_hood: 0.000169 secs
basisu::hash_map: 0.000116 secs
std::unordered_map: 0.000536 secs
No reserve():
robin_hood: 0.000472 secs
basisu::hash_map: 0.000518 secs
std::unordered_map: 0.000542 secs
reserve(5000):
robin_hood: 0.000169 secs
basisu::hash_map: 0.000116 secs
std::unordered_map: 0.000536 secs
No reserve():
robin_hood: 0.000472 secs
basisu::hash_map: 0.000518 secs
std::unordered_map: 0.000542 secs
Time to insert 50,000 random pairs:
reserve(50000):
robin_hood: 0.003517 secs
basisu::hash_map: 0.001477 secs
std::unordered_map: 0.005888 secs
no reserve():
robin_hood: 0.003947 secs
basisu::hash_map: 0.004990 secs
std::unordered_map: 0.007589 secs
reserve(50000):
robin_hood: 0.003517 secs
basisu::hash_map: 0.001477 secs
std::unordered_map: 0.005888 secs
no reserve():
robin_hood: 0.003947 secs
basisu::hash_map: 0.004990 secs
std::unordered_map: 0.007589 secs
500,000 random pairs:
reserve(500000):
robin_hood: 0.051847 secs
basisu::hash_map: 0.028218 secs
std::unordered_map: 0.089402 secs
no reserve():
robin_hood: 0.064207 secs
basisu::hash_map: 0.065813 secs
std::unordered_map: 0.138318 secs
reserve(500000):
robin_hood: 0.051847 secs
basisu::hash_map: 0.028218 secs
std::unordered_map: 0.089402 secs
no reserve():
robin_hood: 0.064207 secs
basisu::hash_map: 0.065813 secs
std::unordered_map: 0.138318 secs
5,000,000 random pairs:
reserve(5000000):
robin_hood: 0.641050 secs
basisu::hash_map: 0.482757 secs
std::unordered_map: 1.197943 secs
no reserve():
robin_hood: 0.817126 secs
basisu::hash_map: 0.982160 secs
std::unordered_map: 2.299709 secs
reserve(5000000):
robin_hood: 0.641050 secs
basisu::hash_map: 0.482757 secs
std::unordered_map: 1.197943 secs
no reserve():
robin_hood: 0.817126 secs
basisu::hash_map: 0.982160 secs
std::unordered_map: 2.299709 secs
This is with MSVC 2019, x64 release build, on a Xeon E5 2690-v2.
robin_hood is robin_hood::unordered_map, basisu::hash_map is similar to the crunch lib's hash_map.
What robin_hood is great at is handling resizing the container. If you use the reserve() method before you build your hash table (even with a guess), its perf advantages evaporate.
What robin_hood is great at is handling resizing the container. If you use the reserve() method before you build your hash table (even with a guess), its perf advantages evaporate.
I've only spent like 15 minutes with this, but I think it's possible to significantly improve robin_hood:unordered_map. Benchmarks must check with and without calling reserve() on all containers. It's in many cases trivial to guess a reasonable reserve size before inserting.
I don't see the word "reserve" in this benchmark: https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-02-01-result-CtorDtorEmptyMap/
Yup, no reserve() in the benchmarks, so this isn't useful to me (but will of course still be useful to many others that can't/won't/don't use reserve())
Note that many algorithms strongly depend on extremely fast/efficient hash map containers, so any improvements made here matter a great deal.