Documentation ¶
Index ¶
- type AddFieldOption
- type Fields
- type Index
- func (r *Index) AddField(fieldName string, tokenStream analysis.TokenStream, options ...AddFieldOption) error
- func (r *Index) AddFieldString(fieldName string, text string, analyzer analysis.Analyzer) error
- func (r *Index) AddIndexAbleField(field document.IndexableField, analyzer analysis.Analyzer) error
- func (r *Index) CreateSearcher() *search.IndexSearcher
- func (r *Index) Freeze()
- func (r *Index) NewIndexReader(fields *treemap.Map[string, *info]) *IndexReader
- func (r *Index) Reset() error
- func (r *Index) Search(query search.Query) float64
- func (r *Index) SetSimilarity(similarity index.Similarity) error
- type IndexReader
- func (m *IndexReader) CheckIntegrity() error
- func (m *IndexReader) DoClose() error
- func (m *IndexReader) DocumentV1(docID int, visitor document.StoredFieldVisitor) error
- func (m *IndexReader) GetBinaryDocValues(field string) (index.BinaryDocValues, error)
- func (m *IndexReader) GetFieldInfos() *index.FieldInfos
- func (m *IndexReader) GetLiveDocs() util.Bits
- func (m *IndexReader) GetMetaData() *index.LeafMetaData
- func (m *IndexReader) GetNormValues(field string) (index.NumericDocValues, error)
- func (m *IndexReader) GetNumericDocValues(field string) (index.NumericDocValues, error)
- func (m *IndexReader) GetPointValues(field string) (types.PointValues, bool)
- func (m *IndexReader) GetReaderCacheHelper() index.CacheHelper
- func (m *IndexReader) GetSortedDocValues(field string) (index.SortedDocValues, error)
- func (m *IndexReader) GetSortedNumericDocValues(field string) (index.SortedNumericDocValues, error)
- func (m *IndexReader) GetSortedSetDocValues(field string) (index.SortedSetDocValues, error)
- func (m *IndexReader) GetTermVectors(docID int) (index.Fields, error)
- func (m *IndexReader) MaxDoc() int
- func (m *IndexReader) NumDocs() int
- func (m *IndexReader) Terms(field string) (index.Terms, error)
- type Option
- type Terms
- func (t *Terms) GetDocCount() (int, error)
- func (t *Terms) GetSumDocFreq() (int64, error)
- func (t *Terms) GetSumTotalTermFreq() (int64, error)
- func (t *Terms) HasFreqs() bool
- func (t *Terms) HasOffsets() bool
- func (t *Terms) HasPayloads() bool
- func (t *Terms) HasPositions() bool
- func (t *Terms) Iterator() (index.TermsEnum, error)
- func (t *Terms) Size() (int, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type AddFieldOption ¶
type AddFieldOption func(*addFieldOption)
func WithOffsetGap ¶
func WithOffsetGap(offsetGap int) AddFieldOption
func WithPositionIncrementGap ¶
func WithPositionIncrementGap(positionIncrementGap int) AddFieldOption
type Index ¶
type Index struct {
// contains filtered or unexported fields
}
Index High-performance single-document main memory Apache Lucene fulltext search index. Overview This class is a replacement/substitute for a large subset of RAMDirectory functionality. It is designed to enable maximum efficiency for on-the-fly matchmaking combining structured and fuzzy fulltext search in realtime streaming applications such as Nux XQuery based XML message queues, publish-subscribe systems for Blogs/newsfeeds, text chat, data acquisition and distribution systems, application level routers, firewalls, classifiers, etc. Rather than targeting fulltext search of infrequent queries over huge persistent data archives (historic search), this class targets fulltext search of huge numbers of queries over comparatively small transient realtime data (prospective search). For example as in float score = search(String text, Query query)
这个类是RAMDirectory功能的一个子集的替代品。它的设计目的是在实时流应用程序中实现最大效率的动态匹配, 将结构化和模糊全文搜索相结合,如基于Nux XQuery的XML消息队列、博客/新闻源的发布订阅系统、文本聊天、数据采集和分发系统、 应用程序级路由器、防火墙、分类器等。该类的目标不是在巨大的持久数据档案中进行不频繁查询的全文搜索(历史搜索), 而是在相对较小的瞬态实时数据上进行大量查询的全文检索(前瞻性搜索)。例如,如 score = search(String text, Query query)
Each instance can hold at most one Lucene "document", with a document containing zero or more "fields", each field having a name and a fulltext value. The fulltext value is tokenized (split and transformed) into zero or more index terms (aka words) on addField(), according to the policy implemented by an Analyzer. For example, Lucene analyzers can split on whitespace, normalize to lower case for case insensitivity, ignore common terms with little discriminatory value such as "he", "in", "and" (stop words), reduce the terms to their natural linguistic root form such as "fishing" being reduced to "fish" (stemming), resolve synonyms/inflexions/thesauri (upon indexing and/or querying), etc. For details, see Lucene Analyzer intro.
每个实例最多可以包含一个Lucene“文档”,其中一个文档包含零个或多个“字段”,每个字段都有一个名称和一个全文值。 根据分析器实现的策略,全文值在addField()上被标记(拆分和转换)为零个或多个索引项(也称为单词)。 例如,Lucene分析器可以对空白进行拆分,针对不区分大小写的情况将其标准化为小写,忽略“he”、“in”和“(停止词) 等几乎没有区别性的常用术语,将术语简化为其自然语言词根形式,如将“fishing”简化为“fish”(词干), 解析同义词/屈折词/词库(在索引和/或查询时)等。有关详细信息,请参阅Lucene Analyzer简介。
Arbitrary Lucene queries can be run against this class - see Lucene Query Syntax as well as Query Parser Rules. Note that a Lucene query selects on the field names and associated (indexed) tokenized terms, not on the original fulltext(s) - the latter are not stored but rather thrown away immediately after tokenization.
可以针对此类运行任意Lucene查询-请参阅Lucene查询语法以及查询分析器规则。 请注意,Lucene查询在字段名称和相关联的(索引的)标记化术语上进行选择, 而不是在原始全文上进行选择——后者不会存储,而是在标记化后立即丢弃。
For some interesting background information on search technology, see Bob Wyman's Prospective Search, Jim Gray's A Call to Arms - Custom subscriptions, and Tim Bray's On Search, the Series. Example Usage
Analyzer analyzer = new SimpleAnalyzer(version); MemoryIndex index = new MemoryIndex(); index.addField("content", "Readings about Salmons and other select Alaska fishing Manuals", analyzer); index.addField("author", "Tales of James", analyzer); QueryParser parser = new QueryParser(version, "content", analyzer); float score = index.search(parser.parse("+author:james +salmon~ +fish* manual~")); if (score > 0.0f) { System.out.println("it's a match"); } else { System.out.println("no match found"); } System.out.println("indexData=" + index.toString());
Example XQuery Usage
(: An XQuery that finds all books authored by James that have something to do with "salmon fishing manuals", sorted by relevance :) declare namespace lucene = "java:nux.xom.pool.FullTextUtil"; declare variable $query := "+salmon~ +fish* manual~"; (: any arbitrary Lucene query can go here :) for $book in /books/book[author="James" and lucene:match(abstract, $query) > 0.0] let $score := lucene:match($book/abstract, $query) order by $score descending return $book
Thread safety guarantees Index is not normally thread-safe for adds or queries. However, queries are thread-safe after freeze() has been called. Performance Notes Internally there's a new data structure geared towards efficient indexing and searching, plus the necessary support code to seamlessly plug into the Lucene framework. This class performs very well for very small texts (e.g. 10 chars) as well as for large texts (e.g. 10 MB) and everything in between. Typically, it is about 10-100 times faster than RAMDirectory. Note that RAMDirectory has particularly large efficiency overheads for small to medium sized texts, both in time and space. Indexing a field with N tokens takes O(N) in the best case, and O(N logN) in the worst case. Memory consumption is probably larger than for RAMDirectory. Example throughput of many simple term queries over a single Index: ~500000 queries/sec on a MacBook Pro, jdk 1.5.0_06, server VM. As always, your mileage may vary. If you're curious about the whereabouts of bottlenecks, run java 1.5 with the non-perturbing '-server -agentlib:hprof=cpu=samples,depth=10' flags, then study the trace log and correlate its hotspot trailer with its call stack headers (see hprof tracing ).
func NewFromDocument ¶
func (*Index) AddField ¶
func (r *Index) AddField(fieldName string, tokenStream analysis.TokenStream, options ...AddFieldOption) error
AddField Iterates over the given token stream and adds the resulting terms to the index; Equivalent to adding a tokenized, indexed, termVectorStored, unstored, Lucene org.apache.lucene.document.Field. Finally closes the token stream. Note that untokenized keywords can be added with this method via keywordTokenStream(Collection), the Lucene KeywordTokenizer or similar utilities.
func (*Index) AddFieldString ¶
func (*Index) AddIndexAbleField ¶
AddIndexAbleField Adds a lucene IndexableField to the Index using the provided analyzer. Also stores doc values based on IndexableFieldType.docValuesType() if set. Params: field – the field to add analyzer – the analyzer to use for term analysis TODO: 完善代码
func (*Index) CreateSearcher ¶
func (r *Index) CreateSearcher() *search.IndexSearcher
func (*Index) Freeze ¶
func (r *Index) Freeze()
Freeze Prepares the Index for querying in a non-lazy way. After calling this you can query the Index from multiple threads, but you cannot subsequently add new data.
func (*Index) NewIndexReader ¶
func (r *Index) NewIndexReader(fields *treemap.Map[string, *info]) *IndexReader
func (*Index) Reset ¶
Reset Resets the MemoryIndex to its initial state and recycles all internal buffers.
func (*Index) Search ¶
Search Convenience method that efficiently returns the relevance score by matching this index against the given Lucene query expression. Params: query – an arbitrary Lucene query to run against this index Returns: the relevance score of the matchmaking; A number in the range [0.0 .. 1.0], with 0.0 indicating
no match. The higher the number the better the match.
func (*Index) SetSimilarity ¶
func (r *Index) SetSimilarity(similarity index.Similarity) error
SetSimilarity Set the Similarity to be used for calculating field norms
type IndexReader ¶
type IndexReader struct { *index.BaseLeafReader // contains filtered or unexported fields }
IndexReader Search support for Lucene framework integration; implements all methods required by the Lucene Reader contracts.
func (*IndexReader) CheckIntegrity ¶
func (m *IndexReader) CheckIntegrity() error
func (*IndexReader) DoClose ¶
func (m *IndexReader) DoClose() error
func (*IndexReader) DocumentV1 ¶
func (m *IndexReader) DocumentV1(docID int, visitor document.StoredFieldVisitor) error
func (*IndexReader) GetBinaryDocValues ¶
func (m *IndexReader) GetBinaryDocValues(field string) (index.BinaryDocValues, error)
func (*IndexReader) GetFieldInfos ¶
func (m *IndexReader) GetFieldInfos() *index.FieldInfos
func (*IndexReader) GetLiveDocs ¶
func (m *IndexReader) GetLiveDocs() util.Bits
func (*IndexReader) GetMetaData ¶
func (m *IndexReader) GetMetaData() *index.LeafMetaData
func (*IndexReader) GetNormValues ¶
func (m *IndexReader) GetNormValues(field string) (index.NumericDocValues, error)
func (*IndexReader) GetNumericDocValues ¶
func (m *IndexReader) GetNumericDocValues(field string) (index.NumericDocValues, error)
func (*IndexReader) GetPointValues ¶
func (m *IndexReader) GetPointValues(field string) (types.PointValues, bool)
func (*IndexReader) GetReaderCacheHelper ¶
func (m *IndexReader) GetReaderCacheHelper() index.CacheHelper
func (*IndexReader) GetSortedDocValues ¶
func (m *IndexReader) GetSortedDocValues(field string) (index.SortedDocValues, error)
func (*IndexReader) GetSortedNumericDocValues ¶
func (m *IndexReader) GetSortedNumericDocValues(field string) (index.SortedNumericDocValues, error)
func (*IndexReader) GetSortedSetDocValues ¶
func (m *IndexReader) GetSortedSetDocValues(field string) (index.SortedSetDocValues, error)
func (*IndexReader) GetTermVectors ¶
func (m *IndexReader) GetTermVectors(docID int) (index.Fields, error)
func (*IndexReader) MaxDoc ¶
func (m *IndexReader) MaxDoc() int
func (*IndexReader) NumDocs ¶
func (m *IndexReader) NumDocs() int
type Option ¶
type Option func(*option)