mat

package
v0.0.0-...-b044761 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 31, 2022 License: MIT Imports: 6 Imported by: 0

README

mat

Matrix package with cache-aware lock-free tiling optimization.

Getting started

The following illustrates some basic usage of mat.

// Create 2x3 matrix, specified its value
// New() will throw error if provided values 
// is ineuqal to its dimension
A, err := mat.New(2, 3)(
    1, 2, 3,
    4, 5, 6,
)

// Create a 3x4 random matrix
B := mat.Rand(3, 4)

// Create a 2x4 zero matrix
C := mat.Zero(2, 4)


// C = A x B, throw err if dimentionality error
err = A.Dot(B, C)
err = A.Dot(B, C) // with concurrency optimization
// or
C, err := mat.Dot(A, B) // alloc new matrix inside Dot

License

MIT © changkun

Documentation

Overview

Package mat is a golang matrix package with cache-aware lock-free tiling optimization.

The following contents explains how multiplication algorithm works.

Prior knowledge

Assume only 2 levels in the hierarchy, fast(registers/cache) and slow(main memory). All data initially in slow memory

m:       number of memory elements (words) moved between fast and slow memory
t_m:     time per slow memory operation
f:       number of arithemetic operations
t_f:     time per arithmetic operation (t_f << t_m)
q = f/m: computational intensity (key to algorithm efficiency) average number
         of flops per slow memory access

Minimum possible time = f * t_f when all data in fast memory.
Actual time = f * t_f + m * t_m = f * t_f * [1 + (t_m / t_f) * (1 / q)]
Machine balance a = t_m / t_f (key to machine efficiency)

Larger q means time closer to minimum f*t_f

q >= t_m / t_f needed to get at least half of peak speed

Tiled matrix multiply

C = A·B =
 /          \   /         \      /                                \
|  A11  A12  | |  B11 B12  |    | A11·B11+A12·B21  A11·B12+A12·B22 |
|            | |           | == |                                  |
|  A21  A22  | |  B21 B22  |    | A21·B11+A22·B22  A21·B12+A22·B22 |
 \          /   \         /      \                                /

Consider A, B, C to be N-by-N matrices of b-by-b sub-blocks:

b = n / N is called the *block size*

Thus

m = N*n*n read each block of B N*N*N times (N*N*N * b*b = N*N*N * (n/N)*(n/N) = N*n*n)
  + N*n*n read each block of A N*N*N times
  + 2*n*n read and write each block of C once
  = 2*(N+1)*n*n

Hence, computational intensity q = f/m = 2*n*n*n / (2*(N+1)*n*n)

q ~= n/N = b (for large n)

So we can improve performance by increasing the blocksize b Can be much faster than matrix-vector multiply (q=2)

The blocked algorithm has computational intensity q ~= b: the large the block size, the more efficient algorithm will be

Limit: All three blocks from A, B, C must fit in fast memory (cache), so we cannot make these blocks arbitrarily large.

Assume our fast memory has size M_fast:

3*b*b <= M_fast, so q ~= b <= sqrt(M_fast / 3)

To build a machine to run matrix multiply at 1/2 peak arthmetic speed of the machine, we need a fast memory of size

M_fast >= 3*b*b ~= 3*q*q = 3*(t_m / t_f)*(t_m / t_f)

This size is reasonable for L1 cache, but not for register sets Note: analysis assumes it is possible to schedule the instructions perfectly.

Limits

The tiled algorithm changes the order in which values are accumulated into each C[i, j] by applying commutativity and associativity: Get slightly different answers from naive version, because of roundoff.

The previous anlysis showed that the blocked algorithm has computational intensity:

q ~= b <= sqrt(M_fast / 3)

There is a lower bound result that says we cannot do any better than this (using only associativity):

Theorem (Hong & Kong, 1981): Any reorganization of this algorithm (that uses only associativity) is limited to:

q = O(sqrt(M_fast))

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrNumElem = errors.New("bad number of elements")
	ErrMatSize = errors.New("bad size of matrix")
)

Errors

Functions

func NewDense

func NewDense(m, n int) func(...float64) (*Dense, error)

NewDense a size by size matrix

func NewDenseP

func NewDenseP(m, n int) func(...float64) (*Dense, error)

NewDenseP a size by size matrix concurrently

Types

type Dense

type Dense struct {
	// contains filtered or unexported fields
}

Dense implements Matrix interface dense matrix underlying data struct

func Add

func Add(A, B *Dense) (*Dense, error)

Add A+B

func Dot

func Dot(A, B *Dense) (*Dense, error)

Dot matrix multiplication

func DotP

func DotP(A, B *Dense) (*Dense, error)

DotP matrix multiplication

func Rand

func Rand(m, n int) *Dense

Rand creates a size by size random matrix

func RandP

func RandP(m, n int) *Dense

RandP creates a size by size random matrix concurrently

func Zero

func Zero(m, n int) *Dense

Zero matrix

func (*Dense) Add

func (A *Dense) Add(B Matrix) error

Add adds matrix B to A

func (*Dense) At

func (A *Dense) At(i, j int) float64

At access element (i, j)

func (*Dense) Col

func (A *Dense) Col() int

Col of matrix

func (*Dense) Dot

func (A *Dense) Dot(B, C Matrix) (err error)

Dot matrix multiplication

func (*Dense) DotBlock

func (A *Dense) DotBlock(B, C Matrix) (err error)

DotBlock matrix multiplication Use JIK block 36 version here, see ./benchmark/README.md

func (*Dense) DotBlockIJK

func (A *Dense) DotBlockIJK(blockSize int, B, C Matrix) (err error)

DotBlockIJK block matrix multiplication

func (*Dense) DotBlockIJKP

func (A *Dense) DotBlockIJKP(blockSize int, B, C Matrix) (err error)

DotBlockIJKP block matrix multiplication

func (*Dense) DotBlockIKJ

func (A *Dense) DotBlockIKJ(blockSize int, B, C Matrix) (err error)

DotBlockIKJ block matrix multiplication

func (*Dense) DotBlockIKJP

func (A *Dense) DotBlockIKJP(blockSize int, B, C Matrix) (err error)

DotBlockIKJP block matrix multiplication

func (*Dense) DotBlockJIK

func (A *Dense) DotBlockJIK(blockSize int, B, C Matrix) (err error)

DotBlockJIK block matrix multiplication

func (*Dense) DotBlockJIKP

func (A *Dense) DotBlockJIKP(blockSize int, B, C Matrix) (err error)

DotBlockJIKP block matrix multiplication

func (*Dense) DotBlockJKI

func (A *Dense) DotBlockJKI(blockSize int, B, C Matrix) (err error)

DotBlockJKI block matrix multiplication

func (*Dense) DotBlockJKIP

func (A *Dense) DotBlockJKIP(blockSize int, B, C Matrix) (err error)

DotBlockJKIP block matrix multiplication

func (*Dense) DotBlockKIJ

func (A *Dense) DotBlockKIJ(blockSize int, B, C Matrix) (err error)

DotBlockKIJ block matrix multiplication

func (*Dense) DotBlockKIJP

func (A *Dense) DotBlockKIJP(blockSize int, B, C Matrix) (err error)

DotBlockKIJP block matrix multiplication

func (*Dense) DotBlockKJI

func (A *Dense) DotBlockKJI(blockSize int, B, C Matrix) (err error)

DotBlockKJI block matrix multiplication

func (*Dense) DotBlockKJIP

func (A *Dense) DotBlockKJIP(blockSize int, B, C Matrix) (err error)

DotBlockKJIP block matrix multiplication

func (*Dense) DotBlockP

func (A *Dense) DotBlockP(B, C Matrix) (err error)

DotBlockP matrix multiplication Use JIKP block 36 version here, see ./benchmark/README.md

func (*Dense) DotNaive

func (A *Dense) DotNaive(B, C Matrix) (err error)

DotNaive matrix multiplication O(n^3) Use JIK version here, see ./benchmark/README.md

func (*Dense) DotNaiveIJK

func (A *Dense) DotNaiveIJK(B, C Matrix) (err error)

DotNaiveIJK matrix multiplication O(n^3)

func (*Dense) DotNaiveIJKP

func (A *Dense) DotNaiveIJKP(B, C Matrix) (err error)

DotNaiveIJKP matrix multiplication O(n^3)

func (*Dense) DotNaiveIKJ

func (A *Dense) DotNaiveIKJ(B, C Matrix) (err error)

DotNaiveIKJ matrix multiplication O(n^3)

func (*Dense) DotNaiveIKJP

func (A *Dense) DotNaiveIKJP(B, C Matrix) (err error)

DotNaiveIKJP matrix multiplication O(n^3)

func (*Dense) DotNaiveJIK

func (A *Dense) DotNaiveJIK(B, C Matrix) (err error)

DotNaiveJIK matrix multiplication O(n^3)

func (*Dense) DotNaiveJIKP

func (A *Dense) DotNaiveJIKP(B, C Matrix) (err error)

DotNaiveJIKP matrix multiplication O(n^3)

func (*Dense) DotNaiveJKI

func (A *Dense) DotNaiveJKI(B, C Matrix) (err error)

DotNaiveJKI matrix multiplication O(n^3)

func (*Dense) DotNaiveJKIP

func (A *Dense) DotNaiveJKIP(B, C Matrix) (err error)

DotNaiveJKIP matrix multiplication O(n^3)

func (*Dense) DotNaiveKIJ

func (A *Dense) DotNaiveKIJ(B, C Matrix) (err error)

DotNaiveKIJ matrix multiplication O(n^3)

func (*Dense) DotNaiveKIJP

func (A *Dense) DotNaiveKIJP(B, C Matrix) (err error)

DotNaiveKIJP matrix multiplication O(n^3)

func (*Dense) DotNaiveKJI

func (A *Dense) DotNaiveKJI(B, C Matrix) (err error)

DotNaiveKJI matrix multiplication O(n^3)

func (*Dense) DotNaiveKJIP

func (A *Dense) DotNaiveKJIP(B, C Matrix) (err error)

DotNaiveKJIP matrix multiplication O(n^3)

func (*Dense) DotNaiveP

func (A *Dense) DotNaiveP(B, C Matrix) (err error)

DotNaiveP matrix multiplication O(n^3) Use JIKP version here, see ./benchmark/README.md

func (*Dense) DotP

func (A *Dense) DotP(B, C Matrix) (err error)

DotP matrix multiplication

func (*Dense) Equal

func (A *Dense) Equal(B Matrix) bool

Equal A and B?

func (*Dense) EqualShape

func (A *Dense) EqualShape(B Matrix) bool

EqualShape check A.Size() == B.Size()

func (*Dense) Inc

func (A *Dense) Inc(i, j int, val float64)

Inc adds element (i, j) with wal

func (*Dense) Mult

func (A *Dense) Mult(i, j int, val float64)

Mult multiple element (i, j) with wal

func (*Dense) Pow

func (A *Dense) Pow(i, j int, n float64)

Pow computes power of n of element (i, j)

func (*Dense) Print

func (A *Dense) Print()

Print the matrix

func (*Dense) Row

func (A *Dense) Row() int

Row of matrix

func (*Dense) Set

func (A *Dense) Set(i, j int, val float64)

Set set element (i, j) with val

func (*Dense) Size

func (A *Dense) Size() (int, int)

Size of matrix

type Matrix

type Matrix interface {
	Row() int
	Col() int
	Size() (int, int)
	At(i, j int) float64
	Set(i, j int, val float64)
	Inc(i, j int, val float64)
	Mult(i, j int, val float64)
	Pow(i, j int, n float64)
}

Matrix interface

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL