cay - SIMD-based hashmap for Go.
Note that this is an experimental prototype, so don't use in production
Cay's goal is to be a faster and more memory-efficient replacements for the builtin map in Go.
Cay is heavily inspired by the Hashbrown
map in Rust (which in turn is a port of the Swisstable)
As this is my first real endevaour in the world of memory operations and optimizations in Go, I have included my
log of my (mostly failed) experimentation.
Developing on caymap
To start:
- go get -u github.com/minio/asm2plan9s
Developing
To rebuild the ASM files from the c code:
./build.sh
Running
To run the benchmarks:
go test -run=^\$ -cpu 1 -bench '.*' --benchmem
Checking the profile
go tool pprof --http :1234 caymap.test cpu.profile
Debugging
Using perf
Build the caymap tests for linux:
env GOOS=linux GOARCH=amd64 go test -c -o caymap.test
Record the instructions:
./perf record -e dTLB-load-misses -g ./caymap.test -test.run=^\$ -test.cpu 1 -test.benchmem -test.benchtime 10000000x -test.bench '.*simd.*get/same_keys.*'
View
./perf annotate --stdio -l --tui
Using Instruments
To use Instruments, first compile the test binary:
# From the simdmap dir
go test -c -o caymap.test
Then open instruments and use the caymap.test
file as the target. For arguments you can pass in
-test.run=^\$ -test.cpu 1 -test.benchmem -test.benchtime 10000000x -test.bench '.*not_found/dynamic_keys.*'
and that will run each the not found benchmarks 10,000,000 times. Change the regexp for the other benchmarks.
TODO
- Compare memory footprint of caymap and builtin.
HighestBitMask
has a tendency to crash, unless ctrlGroup
is allocated on the heap. My best guess is that it is otherwise not stack aligned (some architectures require 16-byte aligned stacks).
- Returning the element in
Get
is costing 33% of the overall runtime and might be some locality or copying that is expensive.
- Investigate tuning huge pages and transparent huge pages on.
- Investigate page alignment
References
HashBrown and SwissTable
Bit operations
Calling ASM in Go