GoCeannaithe

module
v0.0.0-...-d98123c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 24, 2024 License: AGPL-3.0

README

GoCeannaithe

(go kyan-eh-heh)
Probabilistic Data Structures for Go projects
  • Bloom Filter
  • Counting Bloom Filter
  • Cuckoo Filter
  • Golomb Compressed Set (midterm goal)
  • Parallel-partitioned Bloom Filter (long term goal)
  • Spatial Bloom Filter (long term goal)
  • Layered Bloom Filter
  • Count-min Sketch

Bloom Filter Usage

Hashing functions available
  • common.HashKeyMurmur3 Great compromise between collision avoidance, performance, and distribution (fastest hash in the package, and used by WithAutoConfiguration). Around 168.2 ns/op - 80 B/op - 10 allocs/op during SetBit.
  • common.HashKeySha256 Cryptographically secure with good collision avoidance and distribution, around 5x slower than Murmur3. Around 876.3 ns/op - 240 B/op - 20 allocs/op during SetBit.
  • common.HashKeySha512 Cryptographically secure with good collision avoidance and distribution, around 12x slower than Murmur3. Around 2034 ns/op - 240 B/op - 20 allocs/op during SetBit.
  • common.HashKeySipHash Slower than Murmur3, hardened against "hash flooding"; somewhat slower than Murmur3. Around 261.5 ns/op - 80 B/op - 10 allocs/op during SetBit.
  • common.HashKeyXXhash Tiny bit slower than Murmur3, may have superior collision avoidance and distribution. Around 174.1 ns/op - 80 B/op - 10 allocs/op during SetBit.
Manually configuring the bloom filter

When manually setting up a Bloom Filter, GoCeannaithe expects you to choose the number of bits in the filter (size), and the number of hashes. While there's room for experimentation, in general there is an optimal solution for a given number of items you wish to store, and the probability of a false-positive you desire. You can have a look at how the various parameters of a Bloom Filter interact, and get the optimal parameters, using the following calculator: https://hur.st/bloomfilter/

import (
    "github.com/dryack/GoCeannaithe/pkg/bloom"
    "fmt"
    "log"
)

var size uint64 = 1000000 // size of `bits` in the bloom filter, not the elements
numHashes := 7 // number of different hashes to perform on each element
	
bf, log := bloom.NewBloomFilter[string]().
    WithHashFunctions(numHashes, common.HashKeyMurmur3[string]).
    WithStorage(bloom.NewBitPackingStorage[string](size, nil))
if err != nil {
log.Fatal(err)
}

err = bf.Storage.SetBit("Test")
if err != nil {
    log.Fatal(err)
}

fmt.Println(bf.Storage.CheckBit("monkey")) // True
fmt.Println(bf.Storage.CheckBit("nope")) // False
Automatically configuring the Bloom Filter

Using WithAutoConfigure will utilize the best Storage Type and parameters for the number of elements you wish to store, and your desired rate of false positives (error rate)

import (
    "github.com/dryack/GoCeannaithe/pkg/bloom"
    "fmt"
    "log"
)
size := 3000 // number of elements to be stored
errorRate := 0.10 // desired maximum error rate (10% in this case) 

bf4, err := bloom.NewBloomFilter[float32]().WithAutoConfigure(size, errorRate)
if err != nil {
    log.Fatal(err)
}
	
err = bf4.Storage.SetBit(3.14)
if err != nil {
   log.Fatal(err)
}
	
fmt.Println(bf4.Storage.CheckBit(3.14)) // True
fmt.Println(bf4.Storage.CheckBit(2.71828)) // False

Directories

Path Synopsis
cmd
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL