groove

package module
v0.0.0-...-b7c488f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 2, 2021 License: MIT Imports: 23 Imported by: 2

README

gopher

groove

GoDoc Go Report Card gocover

Query analysis pipeline framework

groove is a library for pipeline construction for query analysis. The groove pipeline comprises a query source (the format of the queries), a statistic source (a source for computing information retrieval statistics), preprocessing steps, any measurements to make, and any output formats.

The groove library is primarily used in boogie which is a front-end DSL for groove. If using groove as a Go library, refer to the simple example below which loads Medline queries and analyses them using Elasticsearch and finally outputs the result into a JSON file.

API Usage

In the below example, we would like to use Elasticsearch to measure some query performance predictors on some Medline queries. For the experiment, we would like to pre-process the queries by making each one only contain alpha-numeric characters, and in lowercase. Finally, we would like to output the results of the measures into a JSON file.

// Construct the pipeline.
pipelineChannel := make(chan groove.Result)
p := pipeline.NewGroovePipeline(
	query.NewTransmuteQuerySource(query.MedlineTransmutePipeline),
	stats.NewElasticsearchStatisticsSource(stats.ElasticsearchHosts("http://localhost:9200"),
		stats.ElasticsearchIndex("medline"),
		stats.ElasticsearchField("abstract"),
		stats.ElasticsearchScroll(true),
		stats.ElasticsearchSearchOptions(stats.SearchOptions{
			Size:    10000,
			RunName: "qpp",
		})),
	pipeline.Measurement(preqpp.AvgICTF, preqpp.SumIDF, preqpp.AvgIDF, preqpp.MaxIDF, preqpp.StdDevIDF, postqpp.ClarityScore),
	pipeline.Evaluation(eval.PrecisionEvaluator, eval.RecallEvaluator),
	pipeline.MeasurementOutput(output.JsonMeasurementFormatter),
	pipeline.EvaluationOutput("medline.qrels", output.JsonEvaluationFormatter),
	pipeline.TrecOutput("medline_qpp.results"))

// Execute it on a directory of queries. A pipeline executes queries in parallel.
go p.Execute("./medline", pipelineChannel)

for {
	// Continue until completed.
	result := <-pipelineChannel
	if result.Type == groove.Done {
		break
	}
	switch result.Type {
	case groove.Measurement:
		// Process the measurement outputs.
		err := ioutil.WriteFile("medline_qpp.json", bytes.NewBufferString(result.Measurements[0]).Bytes(), 0644)
		if err != nil {
			log.Fatal(err)
		}
	case groove.Evaluation:
		// Process the evaluation outputs.
		err := ioutil.WriteFile("medline_qpp_eval.json", bytes.NewBufferString(result.Evaluations[0]).Bytes(), 0644)
		if err != nil {
			log.Fatal(err)
		}
	}
}

Citing

If you use this work for scientific publication, please reference

@inproceedings{scells2018framework,
 author = {Scells, Harrisen and Locke, Daniel and Zuccon, Guido},
 title = {An Information Retrieval Experiment Framework for Domain Specific Applications},
 booktitle = {The 41st International ACM SIGIR Conference on Research \&\#38; Development in Information Retrieval},
 series = {SIGIR '18},
 year = {2018},
} 

The Go gopher was created by Renee French, licensed under Creative Commons 3.0 Attributions license.

Documentation

Overview

Package groove is a query analysis and processing pipeline framework.

Package pipeline provides a framework for constructing reproducible query experiments.

Index

Constants

View Source
const Version = "21.Apr.2021"

Variables

This section is empty.

Functions

func EvaluationOutput

func EvaluationOutput(qrels string, formatters ...output.EvaluationFormatter) func() interface{}

EvaluationOutput configures trec output.

func MeasurementOutput

func MeasurementOutput(formatter ...output.MeasurementFormatter) func() interface{}

MeasurementOutput adds outputs to the pipeline.

func Preprocess

func Preprocess(processor ...preprocess.QueryProcessor) func() interface{}

Preprocess adds preprocessors to the pipeline.

func TrecOutput

func TrecOutput(path string) func() interface{}

TrecOutput configures trec output.

Types

type EvaluationOutputFormat

type EvaluationOutputFormat struct {
	EvaluationFormatters []output.EvaluationFormatter
	EvaluationQrels      trecresults.QrelsFile
}

EvaluationOutputFormat specifies out evaluation output should be formatted.

type ModelConfiguration

type ModelConfiguration struct {
	Generate bool
	Train    bool
	Test     bool
}

ModelConfiguration specifies what actions of a model should be taken by the pipeline.

type Pipeline

type Pipeline struct {
	QueryPath             string
	PubDatesFile          string
	QueriesSource         query.QueriesSource
	StatisticsSource      stats.StatisticsSource
	Preprocess            []preprocess.QueryProcessor
	Transformations       preprocess.QueryTransformations
	Measurements          []analysis.Measurement
	MeasurementFormatters []output.MeasurementFormatter
	MeasurementExecutor   analysis.MeasurementExecutor
	Evaluations           []eval.Evaluator
	EvaluationFormatters  EvaluationOutputFormat
	OutputTrec            output.TrecResults
	QueryCache            combinator.QueryCacher
	Model                 learning.Model
	ModelConfiguration    ModelConfiguration
	QueryFormulator       formulation.Formulator
	Headway               *headway.Client

	CLF rank.CLFOptions
}

Pipeline contains all the information for executing a pipeline for query analysis.

func NewGroovePipeline

func NewGroovePipeline(qs query.QueriesSource, ss stats.StatisticsSource, components ...func() interface{}) Pipeline

NewGroovePipeline creates a new groove pipeline. The query source and statistics source are required. Additional components are provided via the optional functional arguments.

func (Pipeline) Execute

func (p Pipeline) Execute(c chan pipeline.Result)

Execute runs a groove pipeline for a particular directory of queries. noinspection GoNilness

Directories

Path Synopsis
Package analysis provides measurements and analysis tools for queries.
Package analysis provides measurements and analysis tools for queries.
postqpp
Package postqpp implements post-retrieval query performance predictors, reproducing the Java API from https://github.com/lucene4ir/lucene4ir (where applicable)
Package postqpp implements post-retrieval query performance predictors, reproducing the Java API from https://github.com/lucene4ir/lucene4ir (where applicable)
preqpp
Package preqpp implements pre-retrieval query performance predictors, reproducing the Java API from https://github.com/lucene4ir/lucene4ir (where applicable)
Package preqpp implements pre-retrieval query performance predictors, reproducing the Java API from https://github.com/lucene4ir/lucene4ir (where applicable)
probability
Package probability provides abstractions for how to compute how precision and recall is affected by measurements.
Package probability provides abstractions for how to compute how precision and recall is affected by measurements.
cmd
pes
Package combinator contains methods for performing logical operations on queries.
Package combinator contains methods for performing logical operations on queries.
Package eval contains implementations of different evaluation measures for information retrieval.
Package eval contains implementations of different evaluation measures for information retrieval.
Package formulation provides a library for automatically formulating queries.
Package formulation provides a library for automatically formulating queries.
Package rewrite uses query chains to rewrite queries.
Package rewrite uses query chains to rewrite queries.
Package output provides different formats of output for experiments.
Package output provides different formats of output for experiments.
Package preprocess handles preprocessing and transformation of queries.
Package preprocess handles preprocessing and transformation of queries.
Package query provides sources for loading queries in different formats.
Package query provides sources for loading queries in different formats.
package retrieval provides handlers which operate on result lists.
package retrieval provides handlers which operate on result lists.
scripts
Package stats provides implementations of statistic sources.
Package stats provides implementations of statistic sources.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL