Documentation ¶
Index ¶
- Constants
- func MapInputs[T any](strategy *Strategy, inputs []T) map[string]*ValueMask[T]
- func NameForNodeDependentDegree(ruleName, dependentName string) string
- type Dataset
- func (ds *Dataset) Epochs(n int) *Dataset
- func (ds *Dataset) Infinite() *Dataset
- func (ds *Dataset) Name() string
- func (ds *Dataset) Reset()
- func (ds *Dataset) Shuffle() *Dataset
- func (ds *Dataset) WithReplacement() *Dataset
- func (ds *Dataset) Yield() (spec any, inputs, labels []tensor.Tensor, err error)
- type EdgeType
- type Rule
- type Sampler
- type Strategy
- type ValueMask
Constants ¶
const PaddingIndex = 0
PaddingIndex is used for all sampling not fulfilled. Notice 0 is also valid node index. One should always use the mask returned by the Sampler to check whether a value is padding or not.
Variables ¶
This section is empty.
Functions ¶
func MapInputs ¶
MapInputs convert inputs yielded by a sampler.Dataset to map of the Rules Name to the Value/Mask tensors with the samples for this example.
Example 1: if using directly the outputs of a a sampler.Dataset created by this Strategy:
spec, inputs, _, err := ds.Yield() strategy := spec.(*Sampler.Strategy) graphSample := strategy.MapInputs(inputs) Seeds, mask := graphSample["Seeds"].Value, graphSample["Seeds"].Mask ...
Example 2: usage in a model that is fed the output of a sampler.Dataset:
func MyModelGraph(ctx *context.Context, spec any, inputs []*Node) []*Node { strategy := spec.(*Sampler.Strategy) graphSample := strategy.MapInputs(inputs) Seeds, mask := graphSample["Seeds"].Value, graphSample["Seeds"].Mask ... }
func NameForNodeDependentDegree ¶
NameForNodeDependentDegree returns the name of the input field that contains the degree of the given rule node, with respect to the dependent rule node.
Types ¶
type Dataset ¶
type Dataset struct {
// contains filtered or unexported fields
}
Dataset is created by a configured Strategy. Before using it -- by calling Dataset.Yield -- it can be configured to shuffle and number of epochs, or to loop indefinitely. But batch size is not configurable in the Dataset, it is defined as part of the Strategy Rules configuration (see Strategy.Nodes to define the Seeds).
The Dataset is created to be re-entrant, so it can be used with [data.Parallel].
func (*Dataset) Epochs ¶
Epochs configures the dataset to yield those many epochs. Default is 1.
Notice if there are more than one seed node type, an epoch is considered finished whenever the first of the seed types is exhausted.
It returns itself to allow cascading configuration calls.
func (*Dataset) Infinite ¶
Infinite configures the dataset to yield looping over epochs indefinitely. Default is 1 epoch.
func (*Dataset) Reset ¶
func (ds *Dataset) Reset()
Reset implements train.Dataset: it restarts a Dataset after it has been exhausted.
func (*Dataset) Shuffle ¶
Shuffle configures the dataset to shuffle the seed nodes before sampling it. It is reshuffled at every new epoch, resulting and random samples without replacement.
func (*Dataset) WithReplacement ¶
WithReplacement configures the dataset to yield with replacement. This automatically implies `Shuffle` and `Infinite`.
type EdgeType ¶
type EdgeType struct {
// SourceNodeType, TargetNodeType of the edges.
Name, SourceNodeType, TargetNodeType string
// Starts has one entry for each source node (shifted by 1): it points to the start of the list of
// target nodes (edges) that this source node is connected.
//
// So for source node `i`, the list of edges start at `Starts[i-1]` and ends at `Starts[i]`,
// except if `i == 0` in which case the start is at 0.
// It's normal to be 0 if the source node has no target nodes.
//
// The number of sources is given by `len(Starts)`.
Starts []int32
// List of target nodes ordered by source nodes.
// The source node for each edge is given by `Starts` above.
EdgeTargets []int32
// contains filtered or unexported fields
}
EdgeType information used by the Sampler.
func (*EdgeType) EdgeTargetsForSourceIdx ¶
EdgeTargetsForSourceIdx returns a slice with the target nodes for the given source nodes. Don't modify the returned slice, it's in use by the Sampler -- make a copy if you need to modify.
func (*EdgeType) NumSourceNodes ¶
NumSourceNodes for the source node type -- total number of nodes, even if they are not used by the edges.
func (*EdgeType) NumTargetNodes ¶
NumTargetNodes for the source node type -- total number of nodes, even if they are not used by the edges.
type Rule ¶
type Rule struct { Sampler *Sampler Strategy *Strategy // Name of the [Rule]. Name string // ConvKernelScopeName doesn't affect sampling, but can be used to uniquely identify // the scope used for the kernels in a GNN to do convolutions on this rule. // If two rules have the same ConvKernelScopeName, they will share weights. ConvKernelScopeName string // UpdateKernelScopeName doesn't affect sampling, but can be used to uniquely identify // the scope used for the kernels in a GNN to do convolutions on this rule. // If two rules have the same UpdateKernelScopeName, they will share weights. UpdateKernelScopeName string // NodeTypeName of the nodes sampled by this rule. NodeTypeName string // NumNodes for NodeTypeName. Only used if NodeSet is not provided. NumNodes int32 // SourceRule is the Name of the [Rule] this rule uses as source, or empty if // this is a "Node" sampling rule (a root/seed sampling) SourceRule *Rule // Dependents is the list of Rules that depend on this one. // That is other rules that have this Rule as [SourceRule]. // This is to keep track of the graph, and are not involved on the sampling of this rule. Dependents []*Rule // EdgeType that connects the [SourceRule] node type, to the node type ([NodeTypeName]) of this Rule. // This is only set if this is an edge sampling rule. A node sampling rule (for seeds) have this set to nil. EdgeType *EdgeType // Count is the number of samples to create. It will define the last dimension of the tensor sampled. Count int // Shape of the sample for this rule. Shape shapes.Shape // NodeSet is a set of indices that a "Node" rule is allowed to sample from. // E.g.: have separate NodeSet for train, test and validation datasets. NodeSet []int32 }
Rule defines one rule of the sampling strategy. It's created by Strategy.Nodes, Strategy.NodesFromSet and Rule.FromEdges. Don't modify it directly.
func (*Rule) FromEdges ¶
FromEdges returns a Rule that samples nodes from the edges connecting the results of the current Rule `r`.
func (*Rule) IdentitySubRule ¶
IdentitySubRule creates a sub-rule that copies over the current rule, adding one rank (but same size). This is useful when trying to split updates into different parts, with the "IdentitySubRule" taking a subset of the dependents.
func (*Rule) IsIdentitySubRule ¶
IsIdentitySubRule returns whether this is an identity sub-rule with a 1-to-1 mapping.
func (*Rule) IsNode ¶
IsNode returns whether this is a "Node" rule, it can also be seen as a root rule.
func (*Rule) WithKernelScopeName ¶
WithKernelScopeName will set both ConvKernelScopeName and UpdateKernelScopeName to `name`.
type Sampler ¶
type Sampler struct { EdgeTypes map[string]*EdgeType NodeTypesToCount map[string]int32 Frozen bool // When true, it can no longer be changed. }
Sampler can be used to dynamically sample a Graph to be used in GNNs. It implements the train.Dataset interface.
It always samples nodes with the same size, padding whenever there is not enough elements to sample from. This way the resulting tensors will always be the same Shape -- required by XLA.
There are 3 phases when using the Sampler:
(1) Specify the full graph data: define node type and edge types, for example for the OGBN-MAG dataset:
Sampler := Sampler.New() Sampler.AddNodeType("papers", mag.NumPapers) Sampler.AddNodeType("authors", mag.NumAuthors) Sampler.AddEdgeType("writes", "authors", "papers", mag.EdgesWrites, /* reverse= */ false) Sampler.AddEdgeType("writtenBy", "authors", "papers", mag.EdgesWrites, /* reverse= */ true) Sampler.AddEdgeType("cites", "papers", "papers", mag.EdgesCites, /*reverse=*/ false) Sampler.AddEdgeType("citedBy", "papers", "papers", mag.EdgesCites, /*reverse=*/ true)
(2) Create and specify sampling strategy: sampling generates always a tree of elements, with fixed shaped tensors. It uses padding if sampling something that doesn't have enough examples to sample. Example:
trainStrategy := Sampler.NewStrategy() Seeds := trainStrategy.NodesFromSet("Seeds", "papers", batchSize, /* subset= */TrainSplits) citedBy := Seeds.FromEdges(/* Name= */ "citedBy", /* EdgeType= */ "citedBy", 5) authors := Seeds.SampleFromEdgesRandomWithoutReplacement(/* Name= */ "authors", /* edgeSet= */ "writtenBy", 5) coauthoredPapers := authors.SampleFromEdgesRandomWithoutReplacement(/* Name= */ "coauthoredPapers", /* edgeSet= */ "writes", 5) citingAuthors := citedBy.SampleFromEdgesRandomWithoutReplacement(/* Name= */ "citingAuthors", /* edgeSet= */ "writtenBy", 5)
(3) Create a dataset and use it. The `spec` returned by `Yield` is a pointer to the Strategy object, and can be used to create a [GraphSample] by providing it the inputs and labels lists. Example:
trainDataset := trainStrategy.Dataset() for { spec, inputs, labels, err = trainDataset.Yield() samplerStrategy := spec.(*mag.Strategy) sample := samplerStrategy.Parse(inputs, labels) }
Each registration of an edge type creates a corresponding structure to store the edges, that will be used for sampling.
All the information kept by Sampler is available for reading, but avoid changing it directly, and instead use the provided methods.
Example usage:
func Load ¶
Load previously saved Sampler. If filePath doesn't exist, it returns an error that can be checked with os.IsNotExist
func New ¶
func New() *Sampler
New creates a new empty Sampler.
After creating it, use AddNodeType and AddEdgeType to define where to sample from.
func (*Sampler) AddEdgeType ¶
func (s *Sampler) AddEdgeType(name, sourceNodeType, targetNodeType string, edges tensor.Tensor, reverse bool)
AddEdgeType adds the edge type to the list of known edges. It takes the node types names (must have been added with AddNodeType), and the `edges` given as pairs (source node, target node).
If `reverse` is true, it reverts the direction of the sampling. Note that `sourceNodeType` and `targetNodeType` are given before reversing the direction of the edges. So if `reverse` is true, the source is interpreted as the target and vice-versa. Same as the values of `edges`.
The `edges` tensor must have Shape `(Int32)[N, 2]`. It's contents are changed in place -- they are sorted by the source node type (or target if reversed). But the edges information themselves are not lost.
func (*Sampler) AddNodeType ¶
AddNodeType adds the node with the given Name and Count to the collection of known nodes. This assumes this is a dense representation of the node type -- all indices are valid from `0` to `Count-1`
A sparse node type (e.g.: indices are random numbers from 0 to MAXINT-1 or strings) is not supported.
func (*Sampler) NewStrategy ¶
NewStrategy yields a new Strategy object, based on the graph data definitions of the Sampler object.
Once a strategy is created, the Sampler can no longer be changed -- but multiple strategies can be created based on the same Sampler.
type Strategy ¶
type Strategy struct { Sampler *Sampler // KeepDegrees means the sampler should add a tensor for all edges with the degrees of source sampling nodes. KeepDegrees bool // Rules lists all the rules of a strategy. // It can be used for reading, but don't change it. Rules map[string]*Rule // Seeds lists all the rules that are seeds. // It can be used for reading, but don't change it. Seeds []*Rule // contains filtered or unexported fields }
Strategy is created by Sampler. A Sampler can create multiple [Strategy]s, a typical example is creating one for training, one for validation and one for testing.
After creation (see Sampler.NewStrategy), one defines what and how to sample a subgraph, by creating "Rules" (Rule) that will translate to sampled nodes.
Once the strategy is defined, it can be used to create one or more datasets -- and after datasets are created, the strategy can no longer be changed.
func (*Strategy) NewDataset ¶
NewDataset creates a new Dataset from the configured Strategy. One can create multiple datasets from the same Strategy, but once a Dataset is created, the Strategy is considered frozen and can no longer be modified.
func (*Strategy) Nodes ¶
Nodes creates a rule (named `Name`) to sample nodes randomly without replacement from the node type given by `NodeTypeName`.
Nodes will be indices from 0 to the number of elements of the given node type.
Node sampling (as opposed to Edges sampling) are typically the "root nodes" or "seed nodes" of a tree being sampled, that represent the sampled sub-graph.
If this is used to sample the seed nodes, `Count` in this case will be typically the batch size.
func (*Strategy) NodesFromSet ¶
NodesFromSet creates a rule (named `Name`) to sample nodes randomly without replacement from the node type given by `NodeTypeName`, but selecting only from the given NodeSet.
`NodeSet` is a list of valid node indices for the given node type from which to sample.
Node sampling (as opposed to Edges sampling) are typically the "root nodes" or "seed nodes" of a tree being sampled, that represent the sampled sub-graph.
If this is used to sample the seed nodes, `Count` in this case will be typically the batch size.
type ValueMask ¶
type ValueMask[T any] struct { Value, Mask T }
ValueMask contains a pair of tensor.Tensor or [*graph.Node] (Value, Mask).