keywordmap

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 9, 2024 License: MIT, MIT Imports: 1 Imported by: 0

Documentation

Overview

Package keywordmap provides a trie-based implemetation of a map from strings to indices. Its intended use is mapping strings to indices into a list of keywords. Strings are considered as sequences of bytes.

For typical keyword sets, membership tests using keywordmap are about 1.5 – 2 times faster than tests using map[string]int (as benchmarked on an M1 MacBook Air).

Internally, keywordmap constructs a compact trie backed by an array of unsigned integers. The default (and recommended) Trie type uses 16-bit integers. This is sufficient for realistically sized sets of keywords. You may use GenericTrie[uint32] for larger tries. However, this package is not optimized for dealing with large sets of keywords.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AddToTrie

func AddToTrie[T ByteIndexable, I constraints.Unsigned](trie *GenericTrie[I], word T, wordIndex int) bool

AddToTrie adds word to the trie and associates it with the index wordIndex. It returns true if the word was successfully added to the trie, or false otherwise. A word can fail to be added to the trie if the trie becomes too big. In the case where AddToTrie returns false, part of the word may have been added to the trie.

It is usually better to construct tries using MakeTrie. AddToTrie is useful if there are gaps in the sequence of indices associated with each keyword.

func GetBackingSlice

func GetBackingSlice[I constraints.Unsigned](trie GenericTrie[I]) []I

GetBackingSlice returns a copy of the trie's backing slice. This value (or another slice with the same contents) can be passed to MakeTrieFromBackingSlice. The return value serves no other purpose and has no defined interpretation.

func KeywordIndex

func KeywordIndex[T ByteIndexable, I constraints.Unsigned](trie GenericTrie[I], word T) int

KeywordIndex returns the index of word in the list of keywords passed to MakeTrie/MakeGenericTrie, or -1 if it is not present.

Types

type ByteIndexable

type ByteIndexable interface {
	string | []byte
}

ByteIndexable is a string or byte slice

type GenericTrie

type GenericTrie[I constraints.Unsigned] struct {
	// contains filtered or unexported fields
}

GenericTrie represents a set of strings (such as the set of a programming language's keywords). It should be constructed only via MakeGenericTrie (or MakeTrie when I=uint16). The I type parameter specifies the types of the elements of the backing array.

func MakeEmptyTrie

func MakeEmptyTrie[I constraints.Unsigned]() GenericTrie[I]

MakeEmptyTrie returns an empty trie.

func MakeGenericTrie

func MakeGenericTrie[I constraints.Unsigned, T ByteIndexable](keywords []T) (GenericTrie[I], bool)

MakeGenericTrie constructs a trie from a set of keywords. Each keyword is considered as a sequence of bytes. If your keywords have multiple possible encodings, you will need to add each encoding to the trie. The second return value is true if a suitable trie could be constructed, or false otherwise. In the latter case, the returned trie is empty. A trie can fail to be constructed if the set of keywords is too large.

func MakeTrieFromBackingSlice

func MakeTrieFromBackingSlice[I constraints.Unsigned](slice []I) GenericTrie[I]

MakeTrieFromBackingSlice can be used for minimal-cost construction of a trie. Follow these steps:

  • Construct your desired trie in some scratch code using MakeTrie/MakeGenericTrie.
  • Use GetBackingSlice to get the contents of the backing slice.
  • Copy the contents of the backing slice into your code as a constant.
  • Use this constant as the argument to MakeTrieFromBackingSlice.

It should rarely (if ever) be necessary to initialize a trie using this function, as MakeTrie/MakeGenericTrie are not at all expensive.

type Trie

type Trie = GenericTrie[uint16]

Trie is the recommended instantiation of GenericTrie. A backing array of uint16 suffices for sets of keywords with no more than a few hundred members.

func MakeTrie

func MakeTrie[T ByteIndexable](keywords []T) (Trie, bool)

MakeTrie calls MakeGenericTrie with the I type parameter set to uint16 (the recommended default).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL