search

package
v0.0.0-...-702f6d9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 8, 2024 License: AGPL-3.0 Imports: 13 Imported by: 0

README

Jargon

Embeddings

Problem: We have arbitrary-dimension data, such as descriptions for clubs, or searches for events. Given a piece of this arbitrary-dimension data (search, club desc.) we want to find other arbitrary-dimension data that is similar to it; think 2 club descriptions where both clubs are acapella groups, 2 search queries that are both effectively looking for professional fraternities, etc.

Solution: Transform the arbitrary-dimension data to fixed-dimension data, say, a vector of floating-point numbers that is n-elements large. Make the transformation in such a way that similar arbitrary-dimension pieces of data will also have similar fixed-dimension data, i.e vectors that are close together (think Euclidean distance).

How do we do this transformation: Train a machine learning model on large amounts of text, and then use the model to make vectors.

So what's an embedding? Formally, when we refer to the embedding of a particular object, we refer to the vector created by feeding that object through the machine-learning model. That creates a representation of the object in a fixed-dimension space.

This is arguably the most complex/unintuitive part of understanding search, so here are some extra resources:

OpenAI API

Problem: We need a machine learning model to create the embeddings.

Solution: Use OpenAI's api to create the embeddings for us; we send text over a REST api and we get a back a vector that represents that text's embedding.

PineconeDB

Problem: We've created a bunch of embeddings for our club descriptions (or event descriptions, etc.), we now need a place to store them and a way to search through them (with an embedding for a search query)

Solution: PineconeDB is a vector database that allows us to upload our embeddings and then query them by giving a vector to find similar ones to.

How to create searchable objects for fun and fame and profit

package search

// in backend/src/search/searchable.go
type Searchable interface {
 SearchId() string
 Namespace() string
 EmbeddingString() string
}

// in backend/src/search/pinecone.go
type PineconeClientInterface interface {
 Upsert(item Searchable) *errors.Error
 Delete(item Searchable) *errors.Error
 Search(item Searchable, topK int) ([]string, *errors.Error)
}
  1. Implement the Searchable interface on whatever model you want to make searchable. Searchable requires 3 methods:
    • SearchId(): This should return a unique id that can be used to store a model entry's embedding (if you want to store it at all) in PineconeDB. In practice, this should be the entry's UUID.
    • Namespace(): Namespaces are to PineconeDB what tables are to PostgreSQL. Searching in one namespace will only retrieve vectors in that namespace. In practice, this should be unique to the model type (i.e Club, Event, etc.)
    • EmbeddingString(): This should return the string you want to feed into the OpenAI API and create an embedding for. In practice, create a string with the fields you think will affect the embedding all appended together, and/or try repeating a field multiple times in the string to see if that gives a better search experience.
  2. Use a PineconeClientInterface and call Upsert with your searchable object to send it to the database, and Delete with your searchable object to delete it from the database. Upserts should be done on creation and updating of a model entry, and deletes should be done on deleting of a model entry. In practice, a PineconeClientInterface should be passed in to the various services in backend/server.go, similar to how *gorm.DB and *validator. Validator instances are passed in.

How to search for fun and fame and profit

TODO: (probably create a searchable object that just uses namespace and embeddingstring, pass to pineconeclient search)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AIClientInterface

type AIClientInterface interface {
	CreateEmbedding(items []Searchable) ([]Embedding, *errors.Error)
	CreateModeration(items []Searchable) ([]ModerationResult, *errors.Error)
}

func NewOpenAIClient

func NewOpenAIClient(settings config.OpenAISettings) AIClientInterface

type CreateEmbeddingRequestBody

type CreateEmbeddingRequestBody struct {
	Input []string `json:"input"`
	Model string   `json:"model"`
}

type CreateEmbeddingResponseBody

type CreateEmbeddingResponseBody struct {
	Data []Embedding `json:"data"`
}

type CreateModerationRequestBody

type CreateModerationRequestBody struct {
	Input []string `json:"input"`
	Model string   `json:"model"`
}

type CreateModerationResponseBody

type CreateModerationResponseBody struct {
	Results []ModerationResult `json:"results"`
}

type Embedding

type Embedding struct {
	Embedding []float32 `json:"embedding"`
}

type Match

type Match struct {
	Id     string    `json:"id"`
	Score  float32   `json:"score"`
	Values []float32 `json:"values"`
}

type ModerationResult

type ModerationResult struct {
	Flagged bool `json:"flagged"`
}

type OpenAIClient

type OpenAIClient struct {
	Settings config.OpenAISettings
}

func (*OpenAIClient) CreateEmbedding

func (c *OpenAIClient) CreateEmbedding(items []Searchable) ([]Embedding, *errors.Error)

func (*OpenAIClient) CreateModeration

func (c *OpenAIClient) CreateModeration(items []Searchable) ([]ModerationResult, *errors.Error)

type PineconeClient

type PineconeClient struct {
	PineconeSettings config.PineconeSettings
	IndexName        *mattress.Secret[string]
	// contains filtered or unexported fields
}

func NewPineconeClient

func NewPineconeClient(aiClient AIClientInterface, pineconeSettings config.PineconeSettings) *PineconeClient

Connects to an existing Pinecone index, using the host and keys provided in settings.

func (*PineconeClient) Delete

func (c *PineconeClient) Delete(items []Searchable) *errors.Error

Deletes the given list of searchables from the Pinecone index.

func (*PineconeClient) Search

func (c *PineconeClient) Search(item Searchable) ([]string, *errors.Error)

Runs a search on the Pinecone index given a searchable item, and returns the topK most similar elements' ids.

func (*PineconeClient) Seed

func (c *PineconeClient) Seed(db *gorm.DB) *errors.Error

Seeds the pinecone index with the clubs currently in the database.

func (*PineconeClient) Upsert

func (c *PineconeClient) Upsert(items []Searchable) *errors.Error

Inserts the given list of searchables to the Pinecone index.

type PineconeDeleteRequestBody

type PineconeDeleteRequestBody struct {
	IDs       []string `json:"ids"`
	Namespace string   `json:"namespace"`
	DeleteAll bool     `json:"deleteAll"`
}

type PineconeSearchRequestBody

type PineconeSearchRequestBody struct {
	IncludeValues   bool      `json:"includeValues"`
	IncludeMetadata bool      `json:"includeMetadata"`
	TopK            int       `json:"topK"`
	Vector          []float32 `json:"vector"`
	Namespace       string    `json:"namespace"`
}

type PineconeSearchResponseBody

type PineconeSearchResponseBody struct {
	Matches   []Match `json:"matches"`
	Namespace string  `json:"namespace"`
}

type PineconeUpsertRequestBody

type PineconeUpsertRequestBody struct {
	Vectors   []Vector `json:"vectors"`
	Namespace string   `json:"namespace"`
}

type SearchClientInterface

type SearchClientInterface interface {
	Seed(db *gorm.DB) *errors.Error
	Upsert(items []Searchable) *errors.Error
	Delete(items []Searchable) *errors.Error
	Search(item Searchable) ([]string, *errors.Error)
}

type Searchable

type Searchable interface {
	// SearchId Returns the id this searchable value should be associated with.
	SearchId() string
	// Namespace Returns the namespace this searchable value should be associated with.
	Namespace() string
	// EmbeddingString Returns the string that should be used to create an embedding of this searchable value.
	EmbeddingString() string
}

Searchable Represents a value that can be searched (i.e, able to create embeddings and upload them to vector db)

type Vector

type Vector struct {
	ID     string    `json:"id"`
	Values []float32 `json:"values"`
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL