Documentation ¶
Index ¶
- Constants
- type BloomFilter
- func (b *BloomFilter) Advise(size int)
- func (b *BloomFilter) Count() uint64
- func (b *BloomFilter) Detect(value string) (bool, float64)
- func (b *BloomFilter) ErrorRate(rate float64)
- func (b *BloomFilter) EstimatedErrorRate() float64
- func (b *BloomFilter) ExpectedError() float64
- func (b *BloomFilter) Learn(value string)
- func (b *BloomFilter) Name() string
- func (b *BloomFilter) Pack() []byte
- func (b *BloomFilter) ShortString() string
- func (b *BloomFilter) String() string
- func (b *BloomFilter) Unpack(rawbytes []byte) error
- type Cache
- type Database
- type Mapper
- type Source
- type SourceHit
Constants ¶
const ( // DefaultAdviseSize is the default expected number of elements. DefaultAdviseSize = 75000 // DefaultErrorRate is the default target error rate. // Range 0.0 - 1.0, default value is 1%. DefaultErrorRate = 0.01 )
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BloomFilter ¶
type BloomFilter struct {
// contains filtered or unexported fields
}
BloomFilter is a probabilistic data structure that can represent set membership, such that one can be fully certain an item is NOT in the set, and have a reasonably bounded idea whether an item may be in the set. N.B. in order to have confidence in error bounds, the Advise size estimate should be greater than the number of elements added.
e.g. the question "is X in the set?" has answers "no" and "maybe"
func (*BloomFilter) Advise ¶
func (b *BloomFilter) Advise(size int)
Advise the Detector on the estimated size of the data set.
func (*BloomFilter) Count ¶
func (b *BloomFilter) Count() uint64
Count returns the number of items added to the set (if known)
func (*BloomFilter) Detect ¶
func (b *BloomFilter) Detect(value string) (bool, float64)
Detect predicts the value's inclusion in the data set. It returns true/false for the prediction, along with a confidence score from 0.0-1.0. A score of 0.0 means most likely not in the set, and a score of 1.0 means most like in the set.
func (*BloomFilter) ErrorRate ¶
func (b *BloomFilter) ErrorRate(rate float64)
ErrorRate sets the desired error rate for the Detector.
func (*BloomFilter) EstimatedErrorRate ¶
func (b *BloomFilter) EstimatedErrorRate() float64
EstimatedErrorRate returns the estimated error rate of the set. b=bits per element (1.0 - e^(-k/b))^k
func (*BloomFilter) ExpectedError ¶
func (b *BloomFilter) ExpectedError() float64
ExpectedError returns the expected error rate of the set.
func (*BloomFilter) Learn ¶
func (b *BloomFilter) Learn(value string)
Learn a positive value in the data set.
func (*BloomFilter) Pack ¶
func (b *BloomFilter) Pack() []byte
Pack the detector into a serializable string.
func (*BloomFilter) ShortString ¶
func (b *BloomFilter) ShortString() string
func (*BloomFilter) String ¶
func (b *BloomFilter) String() string
func (*BloomFilter) Unpack ¶
func (b *BloomFilter) Unpack(rawbytes []byte) error
Unpack the detector from a serialized bytes.
type Cache ¶
type Cache struct { // MaxEntries is the maximum number of cache entries before // an item is evicted. Zero means no limit. MaxEntries int // contains filtered or unexported fields }
Cache is an LRU cache. It is not safe for concurrent access.
func NewCache ¶
NewCache creates a new Cache. If maxEntries is zero, the cache has no limit and it's assumed that eviction is done by the caller.
func (*Cache) RemoveOldest ¶
func (c *Cache) RemoveOldest()
RemoveOldest removes the oldest item from the cache.
type Database ¶
A Database of source identifiers and references to mapping resources between them.
func (*Database) DetermineSource ¶
DetermineSource examines the sample data given and tries to guess which source database it came from. It returns a sorted list of possible Sources along with additional statistics.
type Mapper ¶
type Mapper interface { // Get retrieves ids that map to the given id. Get(leftID string) (rightIDs []string, found bool) }
Mapper represents a one-way mapping between identifier sources.
type Source ¶
type Source struct { ID int64 Name string Description string IdentifierType string URL string LinkoutURL string Citation string Subsets map[string]*BloomFilter LastUpdate time.Time }
A Source of identifiers.
type SourceHit ¶
type SourceHit struct { // SourceName of the database hit. SourceName string // Subset of the database if defined. Subset string // Hits is the number of samples that hit the database. Hits uint64 // UniqueHits is the number of sample values that hit the database. UniqueHits uint64 // Tested is the number of sample values tested. Tested uint64 // SubsetRatio indicates the percentage of the subset covered by the sample. // E.g. Hits / |Subset| SubsetRatio float64 // 0.0 - 1.0 // SubsetRatio indicates the percentage of the sample covered by the subset. // E.g. Hits / |Sample| SampleRatio float64 // 0.0 - 1.0 // ExpectedError rate of hits for the source tested. ExpectedError float64 // 0.0-1.0 // Examples lists some sample values that were in the hit set. Examples []string }
SourceHit describes a search hit and some statistics.