filters

package
v0.0.0-...-b31d7c6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 28, 2015 License: MIT Imports: 3 Imported by: 0

Documentation

Overview

Package filters provides a data-record filtering mechanism and basic implementations for typical use cases. It is intended as a complement to the formats sister package, useful for automating unique record extraction from a data file.

A loose naming convention of adding "s" on the end implies that the filter is applied independently for each field of the record. Thus the missing "s" on "require" means that all supplied fields are required simultaneously. The currently supported filters are:

"require"      - drops any record that does NOT match ALL of it's field entries. An empty
                 string ("") require field is skipped, so if you want to require records
                 with blank fields, use the special string FilterBlankEntry

"excludes"     - drops any record matching at least one of it's field entries. An empty
                 string ("") exclude field is skipped, so if you want to exclude records
                 with blank fields, use the special string FilterBlankEntry

                 To exclude multiple keywords from one field, you will either need to
                 use multiple excludes or write a new Filter.

"null_fields"  - remaps fields from a placeholder string into an empty string. For
                 example, many data sources use a placeholder of "-" or "n/a" to
                 indicate a missing element. This filter may also be used to suppress
                 particular values from records.

"split_fields" - splits fields on a delimiter, creating new records for each split. For
                 example, a single record with 3="A,B,C" and a delimiter of "," emits
                 three records with 3="A", 3="B" and 3="C".
                 Note that the delimiter "" is not allowed.

"date_formats" - parses the field value using an strptime format string, and reformats
                 it into a standard representation, of "2006-01-02 15:04:05" in UTC.
                 Note that not all strptime formats are available, see the package
                 at github.com/pbnjay/strptime for a listing.

To support new filters, simply implement the Filter interface and call RegisterFilter before using GetFilter or FilterSet.Append.

Index

Constants

This section is empty.

Variables

View Source
var (
	// FilterBlankEntry is a placeholder for blank string matching in RequireFilter and
	// ExcludeFilter. If for some reason your input contains this text and you need a
	// different representation, this may be overridden in user code.
	FilterBlankEntry = "<BLANK>"
)

Functions

func RegisterFilter

func RegisterFilter(name string, fg FilterGetter)

RegisterFilter adds a new named Filter for discovery by GetFilter or FilterSet.Append.

Types

type Filter

type Filter interface {
	// Setup defines the part strings used to apply this filter to new records.
	Setup(parts map[interface{}]string) error
	// Apply takes an input record and applies the Filter to create 0 or more records.
	Apply(fields map[interface{}]string) []map[interface{}]string
}

Filter defines an interface that manipulates fields from one record into a new slice of records (most often 1-to-1). These manipulations can have optional parameters provided by Setup to control them.

func GetFilter

func GetFilter(name string, fields map[interface{}]string) (Filter, error)

GetFilter returns the named filter, initialized using Setup() with the fields parameter.

type FilterGetter

type FilterGetter func() Filter

FilterGetter returns an instance of a Filter

type FilterSet

type FilterSet struct {
	// contains filtered or unexported fields
}

FilterSet defines an ordered set of filters that are applied to incoming data records. These filters can be use to restrict, reformat, and subdivide data into unique records. Filters are applied in the order they are added with Append(), so results are cumulative and early restrictions can bypass more expensive field splits.

func (*FilterSet) Append

func (fs *FilterSet) Append(ftype string, fields map[interface{}]string) error

Append adds a new filter onto the end of the FilterSet chain.

func (*FilterSet) Apply

func (fs *FilterSet) Apply(fields map[interface{}]string) []map[interface{}]string

Apply calls Filter.Apply for each filter in the FilterSet, and accumulates results. Restrictive filters (such as Require/Exclude) should be applied as early as possible, and expansive filters (such as Split and DateFormat) should be applied as late as possible in order to decrease computational times.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL