sampler

package

v0.9.1 Latest Latest Go to latest Published: Apr 20, 2024 License: Apache-2.0 Imports: 16 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/gomlx/gomlx

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
func MapInputs[T any](strategy *Strategy, inputs []T) map[string]*ValueMask[T]
func NameForNodeDependentDegree(ruleName, dependentName string) string
type Dataset
type EdgeType
type Rule
type Sampler
- func Load(filePath string) (s *Sampler, err error)
- func New() *Sampler
type Strategy
type ValueMask

Constants ¶

View Source

const PaddingIndex = 0

PaddingIndex is used for all sampling not fulfilled. Notice 0 is also valid node index. One should always use the mask returned by the Sampler to check whether a value is padding or not.

Variables ¶

This section is empty.

Functions ¶

func MapInputs ¶

func MapInputs[T any](strategy *Strategy, inputs []T) map[string]*ValueMask[T]

MapInputs convert inputs yielded by a sampler.Dataset to map of the Rules Name to the Value/Mask tensors with the samples for this example.

Example 1: if using directly the outputs of a a sampler.Dataset created by this Strategy:

spec, inputs, _, err := ds.Yield()
strategy := spec.(*Sampler.Strategy)
graphSample := strategy.MapInputs(inputs)
Seeds, mask := graphSample["Seeds"].Value, graphSample["Seeds"].Mask
...

Example 2: usage in a model that is fed the output of a sampler.Dataset:

func MyModelGraph(ctx *context.Context, spec any, inputs []*Node) []*Node {
	strategy := spec.(*Sampler.Strategy)
	graphSample := strategy.MapInputs(inputs)
	Seeds, mask := graphSample["Seeds"].Value, graphSample["Seeds"].Mask
	...
}

func NameForNodeDependentDegree ¶

func NameForNodeDependentDegree(ruleName, dependentName string) string

NameForNodeDependentDegree returns the name of the input field that contains the degree of the given rule node, with respect to the dependent rule node.

Types ¶

type Dataset ¶

type Dataset struct {
	// contains filtered or unexported fields
}

Dataset is created by a configured Strategy. Before using it -- by calling Dataset.Yield -- it can be configured to shuffle and number of epochs, or to loop indefinitely. But batch size is not configurable in the Dataset, it is defined as part of the Strategy Rules configuration (see Strategy.Nodes to define the Seeds).

The Dataset is created to be re-entrant, so it can be used with [data.Parallel].

func (*Dataset) Epochs ¶

func (ds *Dataset) Epochs(n int) *Dataset

Epochs configures the dataset to yield those many epochs. Default is 1.

Notice if there are more than one seed node type, an epoch is considered finished whenever the first of the seed types is exhausted.

It returns itself to allow cascading configuration calls.

func (*Dataset) Infinite ¶

func (ds *Dataset) Infinite() *Dataset

Infinite configures the dataset to yield looping over epochs indefinitely. Default is 1 epoch.

func (*Dataset) Name ¶

func (ds *Dataset) Name() string

Name implements train.Dataset.

func (*Dataset) Reset ¶

func (ds *Dataset) Reset()

Reset implements train.Dataset: it restarts a Dataset after it has been exhausted.

func (*Dataset) Shuffle ¶

func (ds *Dataset) Shuffle() *Dataset

Shuffle configures the dataset to shuffle the seed nodes before sampling it. It is reshuffled at every new epoch, resulting and random samples without replacement.

func (*Dataset) WithReplacement ¶

func (ds *Dataset) WithReplacement() *Dataset

WithReplacement configures the dataset to yield with replacement. This automatically implies `Shuffle` and `Infinite`.

func (*Dataset) Yield ¶

func (ds *Dataset) Yield() (spec any, inputs, labels []tensor.Tensor, err error)

Yield implements train.Dataset. The returned spec is a pointer to the Strategy, and can be used to build a map of the names to the sampled tensors.

type EdgeType ¶

type EdgeType struct {
	// SourceNodeType, TargetNodeType of the edges.
	Name, SourceNodeType, TargetNodeType string

	// Starts has one entry for each source node (shifted by 1): it points to the start of the list of
	// target nodes (edges) that this source node is connected.
	//
	// So for source node `i`, the list of edges start at `Starts[i-1]` and ends at `Starts[i]`,
	// except if `i == 0` in which case the start is at 0.
	// It's normal to be 0 if the source node has no target nodes.
	//
	// The number of sources is given by `len(Starts)`.
	Starts []int32

	// List of target nodes ordered by source nodes.
	// The source node for each edge is given by `Starts` above.
	EdgeTargets []int32
	// contains filtered or unexported fields
}

EdgeType information used by the Sampler.

func (*EdgeType) EdgeTargetsForSourceIdx ¶

func (et *EdgeType) EdgeTargetsForSourceIdx(srcIdx int32) []int32

EdgeTargetsForSourceIdx returns a slice with the target nodes for the given source nodes. Don't modify the returned slice, it's in use by the Sampler -- make a copy if you need to modify.

func (*EdgeType) NumEdges ¶

func (et *EdgeType) NumEdges() int

NumEdges for this type.

func (*EdgeType) NumSourceNodes ¶

func (et *EdgeType) NumSourceNodes() int

NumSourceNodes for the source node type -- total number of nodes, even if they are not used by the edges.

func (*EdgeType) NumTargetNodes ¶

func (et *EdgeType) NumTargetNodes() int

NumTargetNodes for the source node type -- total number of nodes, even if they are not used by the edges.

type Rule ¶

type Rule struct {
	Sampler  *Sampler
	Strategy *Strategy

	// Name of the [Rule].
	Name string

	// ConvKernelScopeName doesn't affect sampling, but can be used to uniquely identify
	// the scope used for the kernels in a GNN to do convolutions on this rule.
	// If two rules have the same ConvKernelScopeName, they will share weights.
	ConvKernelScopeName string

	// UpdateKernelScopeName doesn't affect sampling, but can be used to uniquely identify
	// the scope used for the kernels in a GNN to do convolutions on this rule.
	// If two rules have the same UpdateKernelScopeName, they will share weights.
	UpdateKernelScopeName string

	// NodeTypeName of the nodes sampled by this rule.
	NodeTypeName string

	// NumNodes for NodeTypeName. Only used if NodeSet is not provided.
	NumNodes int32

	// SourceRule is the Name of the [Rule] this rule uses as source, or empty if
	// this is a "Node" sampling rule (a root/seed sampling)
	SourceRule *Rule

	// Dependents is the list of Rules that depend on this one.
	// That is other rules that have this Rule as [SourceRule].
	// This is to keep track of the graph, and are not involved on the sampling of this rule.
	Dependents []*Rule

	// EdgeType that connects the [SourceRule] node type, to the node type ([NodeTypeName]) of this Rule.
	// This is only set if this is an edge sampling rule. A node sampling rule (for seeds) have this set to nil.
	EdgeType *EdgeType

	// Count is the number of samples to create. It will define the last dimension of the tensor sampled.
	Count int

	// Shape of the sample for this rule.
	Shape shapes.Shape

	// NodeSet is a set of indices that a "Node" rule is allowed to sample from.
	// E.g.: have separate NodeSet for train, test and validation datasets.
	NodeSet []int32
}

Rule defines one rule of the sampling strategy. It's created by Strategy.Nodes, Strategy.NodesFromSet and Rule.FromEdges. Don't modify it directly.

func (*Rule) FromEdges ¶

func (r *Rule) FromEdges(name, edgeTypeName string, count int) *Rule

FromEdges returns a Rule that samples nodes from the edges connecting the results of the current Rule `r`.

func (*Rule) IdentitySubRule ¶

func (r *Rule) IdentitySubRule(name string) *Rule

IdentitySubRule creates a sub-rule that copies over the current rule, adding one rank (but same size). This is useful when trying to split updates into different parts, with the "IdentitySubRule" taking a subset of the dependents.

func (*Rule) IsIdentitySubRule ¶

func (r *Rule) IsIdentitySubRule() bool

IsIdentitySubRule returns whether this is an identity sub-rule with a 1-to-1 mapping.

func (*Rule) IsNode ¶

func (r *Rule) IsNode() bool

IsNode returns whether this is a "Node" rule, it can also be seen as a root rule.

func (*Rule) String ¶

func (r *Rule) String() string

String returns an informative description of the rule.

func (*Rule) WithKernelScopeName ¶

func (r *Rule) WithKernelScopeName(name string) *Rule

WithKernelScopeName will set both ConvKernelScopeName and UpdateKernelScopeName to `name`.

type Sampler ¶

type Sampler struct {
	EdgeTypes        map[string]*EdgeType
	NodeTypesToCount map[string]int32
	Frozen           bool // When true, it can no longer be changed.
}

Sampler can be used to dynamically sample a Graph to be used in GNNs. It implements the train.Dataset interface.

It always samples nodes with the same size, padding whenever there is not enough elements to sample from. This way the resulting tensors will always be the same Shape -- required by XLA.

There are 3 phases when using the Sampler:

(1) Specify the full graph data: define node type and edge types, for example for the OGBN-MAG dataset:

Sampler := Sampler.New()
Sampler.AddNodeType("papers", mag.NumPapers)
Sampler.AddNodeType("authors", mag.NumAuthors)
Sampler.AddEdgeType("writes", "authors", "papers", mag.EdgesWrites, /* reverse= */ false)
Sampler.AddEdgeType("writtenBy", "authors", "papers", mag.EdgesWrites, /* reverse= */ true)
Sampler.AddEdgeType("cites", "papers", "papers", mag.EdgesCites, /*reverse=*/ false)
Sampler.AddEdgeType("citedBy", "papers", "papers", mag.EdgesCites, /*reverse=*/ true)

(2) Create and specify sampling strategy: sampling generates always a tree of elements, with fixed shaped tensors. It uses padding if sampling something that doesn't have enough examples to sample. Example:

trainStrategy := Sampler.NewStrategy()
Seeds := trainStrategy.NodesFromSet("Seeds", "papers", batchSize, /* subset= */TrainSplits)
citedBy := Seeds.FromEdges(/* Name= */ "citedBy", /* EdgeType= */ "citedBy", 5)
authors := Seeds.SampleFromEdgesRandomWithoutReplacement(/* Name= */ "authors", /* edgeSet= */ "writtenBy", 5)
coauthoredPapers := authors.SampleFromEdgesRandomWithoutReplacement(/* Name= */ "coauthoredPapers", /* edgeSet= */ "writes", 5)
citingAuthors := citedBy.SampleFromEdgesRandomWithoutReplacement(/* Name= */ "citingAuthors", /* edgeSet= */ "writtenBy", 5)

(3) Create a dataset and use it. The `spec` returned by `Yield` is a pointer to the Strategy object, and can be used to create a [GraphSample] by providing it the inputs and labels lists. Example:

  trainDataset := trainStrategy.Dataset()
  for {
  	spec, inputs, labels, err = trainDataset.Yield()
  	samplerStrategy := spec.(*mag.Strategy)
	  	sample := samplerStrategy.Parse(inputs, labels)
  }

Each registration of an edge type creates a corresponding structure to store the edges, that will be used for sampling.

All the information kept by Sampler is available for reading, but avoid changing it directly, and instead use the provided methods.

Example usage:

func Load ¶

func Load(filePath string) (s *Sampler, err error)

Load previously saved Sampler. If filePath doesn't exist, it returns an error that can be checked with os.IsNotExist

func New ¶

func New() *Sampler

New creates a new empty Sampler.

After creating it, use AddNodeType and AddEdgeType to define where to sample from.

func (*Sampler) AddEdgeType ¶

func (s *Sampler) AddEdgeType(name, sourceNodeType, targetNodeType string, edges tensor.Tensor, reverse bool)

AddEdgeType adds the edge type to the list of known edges. It takes the node types names (must have been added with AddNodeType), and the `edges` given as pairs (source node, target node).

If `reverse` is true, it reverts the direction of the sampling. Note that `sourceNodeType` and `targetNodeType` are given before reversing the direction of the edges. So if `reverse` is true, the source is interpreted as the target and vice-versa. Same as the values of `edges`.

The `edges` tensor must have Shape `(Int32)[N, 2]`. It's contents are changed in place -- they are sorted by the source node type (or target if reversed). But the edges information themselves are not lost.

func (*Sampler) AddNodeType ¶

func (s *Sampler) AddNodeType(name string, count int)

AddNodeType adds the node with the given Name and Count to the collection of known nodes. This assumes this is a dense representation of the node type -- all indices are valid from `0` to `Count-1`

A sparse node type (e.g.: indices are random numbers from 0 to MAXINT-1 or strings) is not supported.

func (*Sampler) NewStrategy ¶

func (s *Sampler) NewStrategy() *Strategy

NewStrategy yields a new Strategy object, based on the graph data definitions of the Sampler object.

Once a strategy is created, the Sampler can no longer be changed -- but multiple strategies can be created based on the same Sampler.

func (*Sampler) Save ¶

func (s *Sampler) Save(filePath string) (err error)

Save Sampler: it will include the edges indices, so it can be reloaded and ready to go.

func (*Sampler) String ¶

func (s *Sampler) String() string

String returns a multi-line informative description of the Sampler data specification.

type Strategy ¶

type Strategy struct {
	Sampler *Sampler

	// KeepDegrees means the sampler should add a tensor for all edges with the degrees of source sampling nodes.
	KeepDegrees bool

	// Rules lists all the rules of a strategy.
	// It can be used for reading, but don't change it.
	Rules map[string]*Rule

	// Seeds lists all the rules that are seeds.
	// It can be used for reading, but don't change it.
	Seeds []*Rule
	// contains filtered or unexported fields
}

Strategy is created by Sampler. A Sampler can create multiple [Strategy]s, a typical example is creating one for training, one for validation and one for testing.

After creation (see Sampler.NewStrategy), one defines what and how to sample a subgraph, by creating "Rules" (Rule) that will translate to sampled nodes.

Once the strategy is defined, it can be used to create one or more datasets -- and after datasets are created, the strategy can no longer be changed.

func (*Strategy) NewDataset ¶

func (strategy *Strategy) NewDataset(name string) *Dataset

NewDataset creates a new Dataset from the configured Strategy. One can create multiple datasets from the same Strategy, but once a Dataset is created, the Strategy is considered frozen and can no longer be modified.

func (*Strategy) Nodes ¶

func (strategy *Strategy) Nodes(name, nodeTypeName string, count int) *Rule

Nodes creates a rule (named `Name`) to sample nodes randomly without replacement from the node type given by `NodeTypeName`.

Nodes will be indices from 0 to the number of elements of the given node type.

Node sampling (as opposed to Edges sampling) are typically the "root nodes" or "seed nodes" of a tree being sampled, that represent the sampled sub-graph.

If this is used to sample the seed nodes, `Count` in this case will be typically the batch size.

func (*Strategy) NodesFromSet ¶

func (strategy *Strategy) NodesFromSet(name, nodeTypeName string, count int, nodeSet []int32) *Rule

NodesFromSet creates a rule (named `Name`) to sample nodes randomly without replacement from the node type given by `NodeTypeName`, but selecting only from the given NodeSet.

`NodeSet` is a list of valid node indices for the given node type from which to sample.

Node sampling (as opposed to Edges sampling) are typically the "root nodes" or "seed nodes" of a tree being sampled, that represent the sampled sub-graph.

If this is used to sample the seed nodes, `Count` in this case will be typically the batch size.

func (*Strategy) String ¶

func (strategy *Strategy) String() string

String returns a multi-line informative description of the strategy.

type ValueMask ¶

type ValueMask[T any] struct {
	Value, Mask T
}

ValueMask contains a pair of tensor.Tensor or [*graph.Node] (Value, Mask).

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL