trie

package module
v0.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 20, 2024 License: MIT Imports: 2 Imported by: 1

README

Trie - compact and efficient radix tree (Patricia trie) implementation in go

Efficient implementation with zero allocation for read operations (Get and Search) and 1 or 2 allocations per Add

Go Reference Go Report Card Coverage Status

Current implementation (due to lack of generics) uses interface{} to store values. But it's type defined as an alias, and you can easily copy source file and replace alias with any other nil'able type (pointer or other interface) and get a definitely typed implementation:

type ValueType = interface{}

Installation

go get github.com/porfirion/trie

Usage

Trie can be used in different ways:

  1. Primarily I created it for searching emojis 😄 in text (in Telegram messages). There are about 3,3k emojis in current standard (https://www.unicode.org/emoji/charts-13.0/emoji-counts.html) and checking them one by one is very costly. For this purpose I added export package: you can generate source code for trie with all available emojis and compile it in your program.

  2. You can use it as map, where key is a slice of arbitrary bytes (map[[]byte]interface{} which is not possible in language because slices are not comparable and can't be used as keys).

  3. You can use this trie to check for any string prefixes (possibly storing some payload for each prefix). See example below.

  4. You can build some router using this Trie. For this purpose I added SubTrie(mask []byte) method that returns sub trie with all entries, prefixed by specified mask, and GetAll(mask []byte) that returns all entries containing specified mask. See example below.

Also, this implementation supports zero-length prefix ([]byte{} or nil). Value associated with this prefix can be used as fallback when no other entries found. Or it can serve as universal prefix for all entries.

Limitation: nil can't be stored as value, because node containing nil value considered empty.

Examples

Search prefixes:

package main

import (
    "fmt"
    "strings"
    "github.com/porfirion/trie"
)

// Also can be created with
//    prefixes := &trie.Trie{}
//    prefixes.PutString("one", 1)
//    prefixes.PutString("two", 2)
//    prefixes.PutString("three", 3)
//    prefixes.PutString("", 0)
//
var prefixes = trie.BuildFromMap(map[string]interface{}{
    "one":   1,
    "two":   2,
    "three": 3,
    "":      0,
})

func Example() {
    var inputs = []string{
        "twoApple",
        "oneBanana",
        "Carrot",
    }

    for _, inp := range inputs {
        if val, prefixLen, ok := prefixes.SearchPrefixInString(inp); ok {
            fmt.Println(strings.Repeat(inp[prefixLen:], val.(int)))
        }
    }

    // Output:
    //AppleApple
    //Banana
}

Use as router:

package main

import (
    "fmt"
    "github.com/porfirion/trie"
)

var routes = trie.BuildFromMap(map[string]interface{}{
    "":                "root", // as universal prefix
    "/api/user":       "user",
    "/api/user/list":  "usersList",
    "/api/group":      "group",
    "/api/group/list": "groupsList",
    "/api/articles/":  "articles",
})

func Example_routing() {
    var inputs = []string{
        "/api/user/list",
        "/api/user/li",
        "/api/group",
        "/api/unknown",
    }

    for _, inp := range inputs {
        exact, ok := routes.GetByString(inp)
        route := routes.GetAllByString(inp)
        if ok {
            fmt.Printf("%-17s:\thandler %-10s\t(route %v)\n", inp, exact, route)
        } else {
            fmt.Printf("%-17s:\thandler not found\t(route %v)\n", inp, route)
        }
    }

    // Output:
    // /api/user/list   :	handler usersList 	(route [root user usersList])
    // /api/user/li     :	handler not found	(route [root user])
    // /api/group       :	handler group     	(route [root group])
    // /api/unknown     :	handler not found	(route [root])
}

Notes

I didn't implement Delete/Remove operation as I assumed only checking for predefined list of prefixes. I you find it useful and really need it - please, open issue, I'll add it.

Documentation

Overview

Package trie contains radix tree implementation in golang (without any dependencies).

Copyright 2021, Mikhail Vitsen (@porfirion) https://github.com/porfirion/trie

Trie maps `[]byte` prefixes to `interface{}` values. It can be used for efficient search of substrings or bulk prefix checking.

Trie is created by

t := &Trie{}
t.Put("foo", "bar")
t.Put("buzz", 123)

or by convenience constructors:

t := BuildFromMap(map[string]interface{}{
  "foo": "bar",
  "buzz": 123,
})

or

t := BuildFromList([]struct{Key string; Value interface{}}{
  {Key: "foo", Value: "bar"},
  {Key: "buzz", Value: 123},
})

Two common operations are:

value, ok := t.Get(key) // returns associated value
value, prefixLen, ok := t.SearchPrefixIn(input) // searches the longest prefix of input, that stored in trie

but also handy method

prefix, ok := t.TakePrefix(input) // returns longest prefix without associated value
Example (Prefixes)
package main

import (
	"fmt"
	"strings"
)

// Also can be created with
//
//	prefixes := &trie.Trie{}
//	prefixes.PutString("one", 1)
//	prefixes.PutString("two", 2)
//	prefixes.PutString("three", 3)
//	prefixes.PutString("", 0)
var prefixes = BuildFromMap(map[string]int{
	"one":   1,
	"two":   2,
	"three": 3,
	"":      0,
})

func main() {
	var inputs = []string{
		"twoApple",
		"oneBanana",
		"Carrot",
	}

	for _, inp := range inputs {
		if val, prefixLen, ok := prefixes.SearchPrefixInString(inp); ok {
			fmt.Println(strings.Repeat(inp[prefixLen:], *val))
		}
	}

}
Output:

AppleApple
Banana
Example (Routing)
package main

import (
	"fmt"
)

func spSliceToStr(a []*string) []string {
	b := make([]string, len(a))
	for i := range a {
		b[i] = *a[i]
	}
	return b
}

var routes = BuildFromMap(map[string]string{
	"":                "root", // as universal prefix
	"/api/user":       "user",
	"/api/user/list":  "usersList",
	"/api/group":      "group",
	"/api/group/list": "groupsList",
	"/api/articles/":  "articles",
})

func main() {
	var inputs = []string{
		"/api/user/list",
		"/api/user/li",
		"/api/group",
		"/api/unknown",
	}

	for _, inp := range inputs {
		exact, ok := routes.GetByString(inp)
		route := routes.GetAllByString(inp)
		if ok {
			fmt.Printf("%-17s:\thandler %-10s\t(route %v)\n", inp, *exact, spSliceToStr(route))
		} else {
			fmt.Printf("%-17s:\thandler not found\t(route %v)\n", inp, spSliceToStr(route))
		}
	}

}
Output:

/api/user/list   :	handler usersList 	(route [root user usersList])
/api/user/li     :	handler not found	(route [root user])
/api/group       :	handler group     	(route [root group])
/api/unknown     :	handler not found	(route [root])

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Trie

type Trie[T any] struct {
	Prefix   []byte
	Value    *T
	Children *[256]*Trie[T]
}

Trie implements sparse radix (Patricia) trie. Makes zero allocation on Get and SearchPrefixIn operations and two allocations per Put

Create it just as &Trie{} and add required data. Also there are some convenience constructors (for example for initialization from map[prefix]value)

When generics come ^^ it would be a

type Trie[type ValueType] struct {...}

func BuildFromList

func BuildFromList[T any](inputs []struct {
	Key   []byte
	Value T
}) *Trie[T]

BuildFromList can be used to create Trie with arbitrary bytes slice as key (not valid strings, etc)

func BuildFromMap

func BuildFromMap[T any](inputs map[string]T) *Trie[T]

BuildFromMap may be useful for var declaration

func (*Trie[T]) Count

func (t *Trie[T]) Count() int

Count returns amount of values (non nil) stored in all nodes of trie.

func (*Trie[T]) Get

func (t *Trie[T]) Get(key []byte) (*T, bool)

Get searches for exactly matching key in trie

func (*Trie[T]) GetAll

func (t *Trie[T]) GetAll(mask []byte) []*T

GetAll returns all Values whose prefixes are subsets of mask

tr := {"": v0, "/user/": v1, "/user/list": v2, "/group/": v3}

tr.GetAll("/user/list", false)
-> [v0, v1, v2]

func (*Trie[T]) GetAllByString

func (t *Trie[T]) GetAllByString(str string) []*T

GetAllByString is a convenience method for GetAll

func (*Trie[T]) GetAllKeyValues

func (t *Trie[T]) GetAllKeyValues(mask []byte) []struct {
	Key   string
	Value T
}

GetAllKeyValues returns all key-value pairs whose prefixes are subsets of mask

tr := {"": v0, "/user/": v1, "/user/list": v2, "/group/": v3}

tr.GetAllKeyValues("/user/list", false)
-> [{"", v0}, {"/user/", v1}, {"/user/list", v2}]

func (*Trie[T]) GetAllKeys

func (t *Trie[T]) GetAllKeys(mask []byte) []string

GetAllKeys returns all keys whose prefixes are subsets of mask

tr := {"": v0, "/user/": v1, "/user/list": v2, "/group/": v3}

tr.GetAll("/user/list", false)
-> ["", "/user/", "/user/list"]

func (*Trie[T]) GetByString

func (t *Trie[T]) GetByString(key string) (*T, bool)

GetByString is a convenience method for Get

func (*Trie[T]) Iterate

func (t *Trie[T]) Iterate(callback func(prefix []byte, value *T))

Iterate calls callback for each value stored in trie

Not thread safe.

Also prefix's underlying array would change on every call - so you can't rely on it after callback finishes (e.g. you should not pass it to another goroutine without copying)

It seems like the only possible iteration order is by key (prefix):

0x1, 0x1 0x1, 0x1 0x2, 0x1 0x3, 0x2, 0x2 0x1, 0x2 0x2, etc...

But it's not guarantied. You shouldn't rely on it!

Can be used in combination with SubTrie:

tr.SubTrie(mask).Iterate(func...)
Example
example := &T{Prefix: []byte{0xF0, 0x9F, 0x91}, Value: sp("short"), Children: &[256]*T{
	0x10: {Prefix: []byte{0x10}, Value: sp("modified")},
	0xA8: {Prefix: []byte{0xA8}, Value: sp("nokey"), Children: &[256]*T{
		0xE2: {Prefix: []byte{0xE2, 0x80, 0x8D}, Value: sp("withsep"), Children: &[256]*T{
			0xF0: {Prefix: []byte{0xF0, 0x9F, 0x94, 0xA7}, Value: sp("withkey")},
		}},
	}},
}}
example.Iterate(func(prefix []byte, value *string) {
	fmt.Printf("[%v] %+v\n", prefix, *value)
})
Output:

[[240 159 145]] short
[[240 159 145 16]] modified
[[240 159 145 168]] nokey
[[240 159 145 168 226 128 141]] withsep
[[240 159 145 168 226 128 141 240 159 148 167]] withkey

func (*Trie[T]) Put

func (t *Trie[T]) Put(newPrefix []byte, val T) (oldValue *T)

Put adds new entry into trie or replaces existing with specified prefix. Prefix can have zero length - associated value would be added into root Trie

WARNING! nil shouldn't be stored as value: you wouldn't be able to find it nor by SearchPrefixIn, nor by Get, nor by Iterate If you don't need any value (you need only prefixes) - you can use struct{}{}. See TakePrefix

There were some variants of control of replace. Handling it in place results in dependency on ValueType (interfaces and pointers are handled differently). Also there are some incomparable types, that cause panic when compared :(. Store separate function like OnReplaceCallback - requires passing it to children. And if you update it in parent - it wouldn't be updated in children automatically. Another problem - node doesn't know it's whole prefix, since it doesn't know it's parent! With current realization caller of Put knows whole prefix and we shouldn't collect it inside. IMHO, much easier.

func (*Trie[T]) PutString

func (t *Trie[T]) PutString(prefix string, value T)

PutString is a convenience method for Put()

func (*Trie[T]) SearchPrefixIn

func (t *Trie[T]) SearchPrefixIn(input []byte) (value *T, prefixLen int, ok bool)

SearchPrefixIn searches the longest matching prefix in input bytes. If input has prefix that matches any stored key returns associated value, prefix length, true OR nil, 0, false otherwise

func (*Trie[T]) SearchPrefixInString

func (t *Trie[T]) SearchPrefixInString(str string) (value *T, prefixLen int, ok bool)

SearchPrefixInString is a convenience method for SearchPrefixIn

Example
tr := BuildFromMap(map[string]func(s string) string{
	"lower": func(s string) string { return strings.ToLower(s) },
	"upper": func(s string) string { return strings.ToUpper(s) },
	"snake": func(s string) string {
		var res = make([]rune, 0, len(s))
		for i, r := range s {
			if unicode.IsUpper(r) && i > 0 {
				res = append(res, '_', unicode.ToLower(r))
			} else {
				res = append(res, unicode.ToLower(r))
			}
		}
		return string(res)
	},
	"camel": func(s string) string {
		var res = make([]rune, 0, len(s))
		toUpper := false
		for _, r := range s {
			if r == '_' {
				toUpper = true
				continue
			} else if toUpper {
				res = append(res, unicode.ToUpper(r))
				toUpper = false
			} else {
				res = append(res, unicode.ToLower(r))
			}
		}
		return string(res)
	},
})
inputs := []string{
	"upperapple",
	"lowerBANANA",
	"cameltest_me",
	"snakeAnotherExample",
	"noprefixString",
}
for _, inp := range inputs {
	if raw, prefixLen, ok := tr.SearchPrefixInString(inp); ok {
		format := *raw
		fmt.Println(format(inp[prefixLen:]))
	} else {
		fmt.Printf("no prefix found in \"%s\"\n", inp)
	}
}
Output:

APPLE
banana
testMe
another_example
no prefix found in "noprefixString"

func (Trie[T]) String

func (t Trie[T]) String() string
Example
example := &T{Prefix: []byte{0xF0, 0x9F, 0x91}, Value: sp("short"), Children: &[256]*T{
	0x10: {Prefix: []byte{0x10}, Value: sp("modified")},
	0xA8: {Prefix: []byte{0xA8}, Value: sp("nokey"), Children: &[256]*T{
		0xE2: {Prefix: []byte{0xE2, 0x80, 0x8D}, Value: sp("withsep"), Children: &[256]*T{
			0xF0: {Prefix: []byte{0xF0, 0x9F, 0x94, 0xA7}, Value: sp("withkey")},
		}},
	}},
}}
fmt.Println(example)
Output:

[F0 9F 91] short
├─10─ [10] modified
├─A8─ [A8] nokey
│     ├─E2─ [E2 80 8D] withsep
│     │     ├─F0─ [F0 9F 94 A7] withkey

func (*Trie[T]) SubTrie

func (t *Trie[T]) SubTrie(mask []byte, keepPrefix bool) (subTrie *Trie[T], ok bool)

SubTrie returns new Trie with all values whose prefixes include mask

keepPrefix indicates whether the new trie should keep original prefixes, or should contain only those parts, that are out of mask:

tr := {"": v0, "/user/": v1, "/user/list": v2, "/group/": v3}.

tr.SubTrie("/user", false)
-> {"/": v1, "/list": v2}

tr.SubTrie("/user", true)
-> {"/user/": v1, "/user/list": v2}

func (*Trie[T]) TakePrefix

func (t *Trie[T]) TakePrefix(str string) (prefix string, ok bool)

TakePrefix returns only found prefix without corresponding value. Just for convenience.

type TrieKey

type TrieKey struct {
	Trie[dummy]
}

func BuildPrefixesOnly

func BuildPrefixesOnly(strs ...string) *TrieKey

BuildPrefixesOnly used to create just searching prefixes without any data

Example
prefixes := BuildPrefixesOnly("tiny_", "small_", "normal_", "large_")

var myString = "large_banana"

if prefix, ok := prefixes.TakePrefix(myString); ok {
	fmt.Printf("prefix \"%s\" found\n", prefix)
} else {
	fmt.Println("no prefix found")
}
Output:

prefix "large_" found

func (*TrieKey) Keys

func (t *TrieKey) Keys(prefix string) []string

func (*TrieKey) Put

func (t *TrieKey) Put(newPrefix []byte)

func (*TrieKey) PutString

func (t *TrieKey) PutString(prefix string)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL