memory

package
v0.0.0-...-53ff736 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 27, 2024 License: Apache-2.0 Imports: 21 Imported by: 1

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AddFieldOption

type AddFieldOption func(*addFieldOption)

func WithOffsetGap

func WithOffsetGap(offsetGap int) AddFieldOption

func WithPositionIncrementGap

func WithPositionIncrementGap(positionIncrementGap int) AddFieldOption

type Fields

type Fields struct {
	// contains filtered or unexported fields
}

func (*Fields) Names

func (m *Fields) Names() []string

func (*Fields) Size

func (m *Fields) Size() int

func (*Fields) Terms

func (m *Fields) Terms(field string) (index.Terms, error)

type Index

type Index struct {
	// contains filtered or unexported fields
}

Index High-performance single-document main memory Apache Lucene fulltext search index. Overview This class is a replacement/substitute for a large subset of RAMDirectory functionality. It is designed to enable maximum efficiency for on-the-fly matchmaking combining structured and fuzzy fulltext search in realtime streaming applications such as Nux XQuery based XML message queues, publish-subscribe systems for Blogs/newsfeeds, text chat, data acquisition and distribution systems, application level routers, firewalls, classifiers, etc. Rather than targeting fulltext search of infrequent queries over huge persistent data archives (historic search), this class targets fulltext search of huge numbers of queries over comparatively small transient realtime data (prospective search). For example as in float score = search(String text, Query query)

这个类是RAMDirectory功能的一个子集的替代品。它的设计目的是在实时流应用程序中实现最大效率的动态匹配, 将结构化和模糊全文搜索相结合,如基于Nux XQuery的XML消息队列、博客/新闻源的发布订阅系统、文本聊天、数据采集和分发系统、 应用程序级路由器、防火墙、分类器等。该类的目标不是在巨大的持久数据档案中进行不频繁查询的全文搜索(历史搜索), 而是在相对较小的瞬态实时数据上进行大量查询的全文检索(前瞻性搜索)。例如,如 score = search(String text, Query query)

Each instance can hold at most one Lucene "document", with a document containing zero or more "fields", each field having a name and a fulltext value. The fulltext value is tokenized (split and transformed) into zero or more index terms (aka words) on addField(), according to the policy implemented by an Analyzer. For example, Lucene analyzers can split on whitespace, normalize to lower case for case insensitivity, ignore common terms with little discriminatory value such as "he", "in", "and" (stop words), reduce the terms to their natural linguistic root form such as "fishing" being reduced to "fish" (stemming), resolve synonyms/inflexions/thesauri (upon indexing and/or querying), etc. For details, see Lucene Analyzer intro.

每个实例最多可以包含一个Lucene“文档”,其中一个文档包含零个或多个“字段”,每个字段都有一个名称和一个全文值。 根据分析器实现的策略,全文值在addField()上被标记(拆分和转换)为零个或多个索引项(也称为单词)。 例如,Lucene分析器可以对空白进行拆分,针对不区分大小写的情况将其标准化为小写,忽略“he”、“in”和“(停止词) 等几乎没有区别性的常用术语,将术语简化为其自然语言词根形式,如将“fishing”简化为“fish”(词干), 解析同义词/屈折词/词库(在索引和/或查询时)等。有关详细信息,请参阅Lucene Analyzer简介。

Arbitrary Lucene queries can be run against this class - see Lucene Query Syntax as well as Query Parser Rules. Note that a Lucene query selects on the field names and associated (indexed) tokenized terms, not on the original fulltext(s) - the latter are not stored but rather thrown away immediately after tokenization.

可以针对此类运行任意Lucene查询-请参阅Lucene查询语法以及查询分析器规则。 请注意,Lucene查询在字段名称和相关联的(索引的)标记化术语上进行选择, 而不是在原始全文上进行选择——后者不会存储,而是在标记化后立即丢弃。

For some interesting background information on search technology, see Bob Wyman's Prospective Search, Jim Gray's A Call to Arms - Custom subscriptions, and Tim Bray's On Search, the Series. Example Usage

Analyzer analyzer = new SimpleAnalyzer(version);
MemoryIndex index = new MemoryIndex();
index.addField("content", "Readings about Salmons and other select Alaska fishing Manuals", analyzer);
index.addField("author", "Tales of James", analyzer);
QueryParser parser = new QueryParser(version, "content", analyzer);
float score = index.search(parser.parse("+author:james +salmon~ +fish* manual~"));
if (score > 0.0f) {
    System.out.println("it's a match");
} else {
    System.out.println("no match found");
}
System.out.println("indexData=" + index.toString());

Example XQuery Usage

(: An XQuery that finds all books authored by James that have something to do
with "salmon fishing manuals", sorted by relevance :)
declare namespace lucene = "java:nux.xom.pool.FullTextUtil";
declare variable $query := "+salmon~ +fish* manual~"; (: any arbitrary Lucene query can go here :)

for $book in /books/book[author="James" and lucene:match(abstract, $query) > 0.0]
let $score := lucene:match($book/abstract, $query)
order by $score descending
return $book

Thread safety guarantees Index is not normally thread-safe for adds or queries. However, queries are thread-safe after freeze() has been called. Performance Notes Internally there's a new data structure geared towards efficient indexing and searching, plus the necessary support code to seamlessly plug into the Lucene framework. This class performs very well for very small texts (e.g. 10 chars) as well as for large texts (e.g. 10 MB) and everything in between. Typically, it is about 10-100 times faster than RAMDirectory. Note that RAMDirectory has particularly large efficiency overheads for small to medium sized texts, both in time and space. Indexing a field with N tokens takes O(N) in the best case, and O(N logN) in the worst case. Memory consumption is probably larger than for RAMDirectory. Example throughput of many simple term queries over a single Index: ~500000 queries/sec on a MacBook Pro, jdk 1.5.0_06, server VM. As always, your mileage may vary. If you're curious about the whereabouts of bottlenecks, run java 1.5 with the non-perturbing '-server -agentlib:hprof=cpu=samples,depth=10' flags, then study the trace log and correlate its hotspot trailer with its call stack headers (see hprof tracing ).

func NewFromDocument

func NewFromDocument(doc *document.Document, analyzer analysis.Analyzer, options ...Option) (*Index, error)

func NewIndex

func NewIndex(options ...Option) (*Index, error)

func (*Index) AddField

func (r *Index) AddField(fieldName string, tokenStream analysis.TokenStream, options ...AddFieldOption) error

AddField Iterates over the given token stream and adds the resulting terms to the index; Equivalent to adding a tokenized, indexed, termVectorStored, unstored, Lucene org.apache.lucene.document.Field. Finally closes the token stream. Note that untokenized keywords can be added with this method via keywordTokenStream(Collection), the Lucene KeywordTokenizer or similar utilities.

func (*Index) AddFieldString

func (r *Index) AddFieldString(fieldName string, text string, analyzer analysis.Analyzer) error

func (*Index) AddIndexAbleField

func (r *Index) AddIndexAbleField(field document.IndexableField, analyzer analysis.Analyzer) error

AddIndexAbleField Adds a lucene IndexableField to the Index using the provided analyzer. Also stores doc values based on IndexableFieldType.docValuesType() if set. Params: field – the field to add analyzer – the analyzer to use for term analysis TODO: 完善代码

func (*Index) CreateSearcher

func (r *Index) CreateSearcher() *search.IndexSearcher

func (*Index) Freeze

func (r *Index) Freeze()

Freeze Prepares the Index for querying in a non-lazy way. After calling this you can query the Index from multiple threads, but you cannot subsequently add new data.

func (*Index) NewIndexReader

func (r *Index) NewIndexReader(fields *treemap.Map[string, *info]) *IndexReader

func (*Index) Reset

func (r *Index) Reset() error

Reset Resets the MemoryIndex to its initial state and recycles all internal buffers.

func (*Index) Search

func (r *Index) Search(query search.Query) float64

Search Convenience method that efficiently returns the relevance score by matching this index against the given Lucene query expression. Params: query – an arbitrary Lucene query to run against this index Returns: the relevance score of the matchmaking; A number in the range [0.0 .. 1.0], with 0.0 indicating

no match. The higher the number the better the match.

func (*Index) SetSimilarity

func (r *Index) SetSimilarity(similarity index.Similarity) error

SetSimilarity Set the Similarity to be used for calculating field norms

type IndexReader

type IndexReader struct {
	*index.BaseLeafReader
	// contains filtered or unexported fields
}

IndexReader Search support for Lucene framework integration; implements all methods required by the Lucene Reader contracts.

func (*IndexReader) CheckIntegrity

func (m *IndexReader) CheckIntegrity() error

func (*IndexReader) DoClose

func (m *IndexReader) DoClose() error

func (*IndexReader) DocumentV1

func (m *IndexReader) DocumentV1(docID int, visitor document.StoredFieldVisitor) error

func (*IndexReader) GetBinaryDocValues

func (m *IndexReader) GetBinaryDocValues(field string) (index.BinaryDocValues, error)

func (*IndexReader) GetFieldInfos

func (m *IndexReader) GetFieldInfos() *index.FieldInfos

func (*IndexReader) GetLiveDocs

func (m *IndexReader) GetLiveDocs() util.Bits

func (*IndexReader) GetMetaData

func (m *IndexReader) GetMetaData() *index.LeafMetaData

func (*IndexReader) GetNormValues

func (m *IndexReader) GetNormValues(field string) (index.NumericDocValues, error)

func (*IndexReader) GetNumericDocValues

func (m *IndexReader) GetNumericDocValues(field string) (index.NumericDocValues, error)

func (*IndexReader) GetPointValues

func (m *IndexReader) GetPointValues(field string) (types.PointValues, bool)

func (*IndexReader) GetReaderCacheHelper

func (m *IndexReader) GetReaderCacheHelper() index.CacheHelper

func (*IndexReader) GetSortedDocValues

func (m *IndexReader) GetSortedDocValues(field string) (index.SortedDocValues, error)

func (*IndexReader) GetSortedNumericDocValues

func (m *IndexReader) GetSortedNumericDocValues(field string) (index.SortedNumericDocValues, error)

func (*IndexReader) GetSortedSetDocValues

func (m *IndexReader) GetSortedSetDocValues(field string) (index.SortedSetDocValues, error)

func (*IndexReader) GetTermVectors

func (m *IndexReader) GetTermVectors(docID int) (index.Fields, error)

func (*IndexReader) MaxDoc

func (m *IndexReader) MaxDoc() int

func (*IndexReader) NumDocs

func (m *IndexReader) NumDocs() int

func (*IndexReader) Terms

func (m *IndexReader) Terms(field string) (index.Terms, error)

type Option

type Option func(*option)

func WithMaxReusedBytes

func WithMaxReusedBytes(maxReusedBytes int64) Option

func WithStoreOffsets

func WithStoreOffsets(storeOffsets bool) Option

func WithStorePayloads

func WithStorePayloads(storePayloads bool) Option

type Terms

type Terms struct {
	*index.TermsBase
	// contains filtered or unexported fields
}

func (*Terms) GetDocCount

func (t *Terms) GetDocCount() (int, error)

func (*Terms) GetSumDocFreq

func (t *Terms) GetSumDocFreq() (int64, error)

func (*Terms) GetSumTotalTermFreq

func (t *Terms) GetSumTotalTermFreq() (int64, error)

func (*Terms) HasFreqs

func (t *Terms) HasFreqs() bool

func (*Terms) HasOffsets

func (t *Terms) HasOffsets() bool

func (*Terms) HasPayloads

func (t *Terms) HasPayloads() bool

func (*Terms) HasPositions

func (t *Terms) HasPositions() bool

func (*Terms) Iterator

func (t *Terms) Iterator() (index.TermsEnum, error)

func (*Terms) Size

func (t *Terms) Size() (int, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL