encoding

package
v0.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 31, 2022 License: Apache-2.0 Imports: 6 Imported by: 0

Documentation

Overview

Package encoding provides the generic APIs implemented by parquet encodings in its sub-packages.

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrNotSupported is an error returned when the underlying encoding does
	// not support the type of values being encoded or decoded.
	//
	// This error may be wrapped with type information, applications must use
	// errors.Is rather than equality comparisons to test the error values
	// returned by encoders and decoders.
	ErrNotSupported = errors.New("encoding not supported")

	// ErrInvalidArguments is an error returned when arguments passed to the
	// encoding functions are incorrect and will lead to an expected failure.
	//
	// As with ErrNotSupported, this error may be wrapped with specific
	// information about the problem and applications are expected to use
	// errors.Is for comparisons.
	ErrInvalidArguments = errors.New("invalid encoding arguments")
)

Functions

This section is empty.

Types

type ByteArrayList

type ByteArrayList struct {
	// contains filtered or unexported fields
}

ByteArrayList is a container similar to [][]byte with a smaller memory overhead. Where using a byte slices introduces ~24 bytes of overhead per element, ByteArrayList requires only 8 bytes per element. Extra efficiency also comes from reducing GC pressure by using contiguous areas of memory instead of allocating individual slices for each element. For lists with many small-size elements, the memory footprint can be reduced by 40-80%.

func MakeByteArrayList

func MakeByteArrayList(capacity int) ByteArrayList

func (*ByteArrayList) Cap

func (list *ByteArrayList) Cap() int

func (*ByteArrayList) Clone

func (list *ByteArrayList) Clone() ByteArrayList

func (*ByteArrayList) Grow

func (list *ByteArrayList) Grow(n int)

func (*ByteArrayList) Index

func (list *ByteArrayList) Index(i int) []byte

func (*ByteArrayList) Len

func (list *ByteArrayList) Len() int

func (*ByteArrayList) Less

func (list *ByteArrayList) Less(i, j int) bool

func (*ByteArrayList) Push

func (list *ByteArrayList) Push(v []byte)

func (*ByteArrayList) PushSize

func (list *ByteArrayList) PushSize(n int) []byte

func (*ByteArrayList) Range

func (list *ByteArrayList) Range(f func([]byte) bool)

func (*ByteArrayList) Reset

func (list *ByteArrayList) Reset()

func (*ByteArrayList) Size

func (list *ByteArrayList) Size() int64

func (*ByteArrayList) Slice

func (list *ByteArrayList) Slice(i, j int) ByteArrayList

func (*ByteArrayList) Split

func (list *ByteArrayList) Split() [][]byte

func (*ByteArrayList) Swap

func (list *ByteArrayList) Swap(i, j int)

type Decoder

type Decoder interface {
	// Calling Reset clears the decoder state and changes the io.Reader where
	// decoded values are written to the one given as argument.
	//
	// The io.Reader may be nil, in which case the decoder must not be used
	// until Reset is called again with a non-nil reader.
	//
	// Calling Reset does not override the bit-width configured on the decoder.
	Reset(io.Reader)

	// Decodes an array of boolean values using this decoder, returning
	// the number of decoded values, and io.EOF if the end of the underlying
	// io.Reader was reached.
	DecodeBoolean(data []bool) (int, error)

	// Decodes an array of 8 bits integer values using this decoder, returning
	// the number of decoded values, and io.EOF if the end of the underlying
	// io.Reader was reached.
	//
	// The parquet type system does not have a 8 bits integers, this method
	// is intended to decode INT32 values but receives them as an array of
	// int8 values to enable greater memory efficiency when the application
	// knows that all values can fit in 8 bits.
	DecodeInt8(data []int8) (int, error)

	// Decodes an array of 16 bits integer values using this decoder, returning
	// the number of decoded values, and io.EOF if the end of the underlying
	// io.Reader was reached.
	//
	// The parquet type system does not have a 16 bits integers, this method
	// is intended to decode INT32 values but receives them as an array of
	// int8 values to enable greater memory efficiency when the application
	// knows that all values can fit in 16 bits.
	DecodeInt16(data []int16) (int, error)

	// Decodes an array of 32 bits integer values using this decoder, returning
	// the number of decoded values, and io.EOF if the end of the underlying
	// io.Reader was reached.
	DecodeInt32(data []int32) (int, error)

	// Decodes an array of 64 bits integer values using this decoder, returning
	// the number of decoded values, and io.EOF if the end of the underlying
	// io.Reader was reached.
	DecodeInt64(data []int64) (int, error)

	// Decodes an array of 96 bits integer values using this decoder, returning
	// the number of decoded values, and io.EOF if the end of the underlying
	// io.Reader was reached.
	DecodeInt96(data []deprecated.Int96) (int, error)

	// Decodes an array of 32 bits floating point values using this decoder,
	// returning the number of decoded values, and io.EOF if the end of the
	// underlying io.Reader was reached.
	DecodeFloat(data []float32) (int, error)

	// Decodes an array of 64 bits floating point values using this decoder,
	// returning the number of decoded values, and io.EOF if the end of the
	// underlying io.Reader was reached.
	DecodeDouble(data []float64) (int, error)

	// Decodes an array of variable length byte array values using this decoder,
	// returning the number of decoded values, and io.EOF if the end of the
	// underlying io.Reader was reached.
	//
	// The values are written to the `data` buffer by calling the Push method,
	// the method returns the number of values written. DecodeByteArray will
	// stop pushing value to the output ByteArrayList if its total capacity is
	// reached.
	DecodeByteArray(data *ByteArrayList) (int, error)

	// Decodes an array of fixed length byte array values using this decoder,
	// returning the number of decoded values, and io.EOF if the end of the
	// underlying io.Reader was reached.
	DecodeFixedLenByteArray(size int, data []byte) (int, error)

	// Configures the bit-width on the decoder.
	//
	// Not all encodings require declaring the bit-width, but applications that
	// use the Decoder abstraction should not make assumptions about the
	// underlying type of the decoder, and therefore should call SetBitWidth
	// prior to decoding repetition and definition levels.
	SetBitWidth(bitWidth int)
}

The Decoder interface is implemented by decoder types.

type Encoder

type Encoder interface {
	// Calling Reset clears the encoder state and changes the io.Writer where
	// encoded values are written to the one given as argument.
	//
	// The io.Writer may be nil, in which case the encoder must not be used
	// until Reset is called again with a non-nil writer.
	//
	// Calling Reset does not override the bit-width configured on the encoder.
	Reset(io.Writer)

	// Encodes an array of boolean values using this encoder.
	EncodeBoolean(data []bool) error

	// Encodes an array of 8 bits integer values using this encoder.
	//
	// The parquet type system does not have a 8 bits integers, this method
	// is intended to encode INT32 values but receives them as an array of
	// int8 values to enable greater memory efficiency when the application
	// knows that all values can fit in 8 bits.
	EncodeInt8(data []int8) error

	// Encodes an array of boolean values using this encoder.
	//
	// The parquet type system does not have a 16 bits integers, this method
	// is intended to encode INT32 values but receives them as an array of
	// int8 values to enable greater memory efficiency when the application
	// knows that all values can fit in 16 bits.
	EncodeInt16(data []int16) error

	// Encodes an array of 32 bit integer values using this encoder.
	EncodeInt32(data []int32) error

	// Encodes an array of 64 bit integer values using this encoder.
	EncodeInt64(data []int64) error

	// Encodes an array of 96 bit integer values using this encoder.
	EncodeInt96(data []deprecated.Int96) error

	// Encodes an array of 32 bit floating point values using this encoder.
	EncodeFloat(data []float32) error

	// Encodes an array of 64 bit floating point values using this encoder.
	EncodeDouble(data []float64) error

	// Encodes an array of variable length byte array values using this encoder.
	EncodeByteArray(data ByteArrayList) error

	// Encodes an array of fixed length byte array values using this encoder.
	//
	// The list is encoded contiguously in the `data` byte slice, in chunks of
	// `size` elements
	EncodeFixedLenByteArray(size int, data []byte) error

	// Configures the bit-width on the encoder.
	//
	// Not all encodings require declaring the bit-width, but applications that
	// use the Encoder abstraction should not make assumptions about the
	// underlying type of the encoder, and therefore should call SetBitWidth
	// prior to encoding repetition and definition levels.
	SetBitWidth(bitWidth int)
}

The Encoder interface is implemented by encoders types.

Some encodings only support partial

type Encoding

type Encoding interface {
	// Returns a human-readable name for the encoding.
	String() string

	// Returns the parquet code representing the encoding.
	Encoding() format.Encoding

	// Checks whether the encoding is capable of serializing parquet values of
	// the given type.
	CanEncode(format.Type) bool

	// Creates a decoder reading encoded values to the io.Reader passed as
	// argument.
	//
	// The io.Reader may be nil, in which case the decoder's Reset method must
	// be called with a non-nil io.Reader prior to decoding values.
	NewDecoder(io.Reader) Decoder

	// Creates an encoder writing values to the io.Writer passed as argument.
	//
	// The io.Writer may be nil, in which case the encoder's Reset method must
	// be called with a non-nil io.Writer prior to encoding values.
	NewEncoder(io.Writer) Encoder
}

The Encoding interface is implemented by types representing parquet column encodings.

Encoding instances must be safe to use concurrently from multiple goroutines.

type NotSupported

type NotSupported struct {
}

NotSupported is a type satisfying the Encoding interface which does not support encoding nor decoding any value types.

func (NotSupported) CanEncode

func (NotSupported) CanEncode(format.Type) bool

func (NotSupported) Encoding

func (NotSupported) Encoding() format.Encoding

func (NotSupported) NewDecoder

func (NotSupported) NewDecoder(io.Reader) Decoder

func (NotSupported) NewEncoder

func (NotSupported) NewEncoder(io.Writer) Encoder

func (NotSupported) String

func (NotSupported) String() string

type NotSupportedDecoder

type NotSupportedDecoder struct {
}

NotSupportedDecoder is an implementation of the Decoder interface which does not support decoding any value types.

Many parquet encodings only support decoding a subset of the parquet types, they can embed this type to default to not supporting any decoding, then override specific Decode* methods to provide implementations for the types they do support.

func (NotSupportedDecoder) DecodeBoolean

func (NotSupportedDecoder) DecodeBoolean([]bool) (int, error)

func (NotSupportedDecoder) DecodeByteArray

func (NotSupportedDecoder) DecodeByteArray(*ByteArrayList) (int, error)

func (NotSupportedDecoder) DecodeDouble

func (NotSupportedDecoder) DecodeDouble([]float64) (int, error)

func (NotSupportedDecoder) DecodeFixedLenByteArray

func (NotSupportedDecoder) DecodeFixedLenByteArray(size int, data []byte) (int, error)

func (NotSupportedDecoder) DecodeFloat

func (NotSupportedDecoder) DecodeFloat([]float32) (int, error)

func (NotSupportedDecoder) DecodeInt16

func (NotSupportedDecoder) DecodeInt16([]int16) (int, error)

func (NotSupportedDecoder) DecodeInt32

func (NotSupportedDecoder) DecodeInt32([]int32) (int, error)

func (NotSupportedDecoder) DecodeInt64

func (NotSupportedDecoder) DecodeInt64([]int64) (int, error)

func (NotSupportedDecoder) DecodeInt8

func (NotSupportedDecoder) DecodeInt8([]int8) (int, error)

func (NotSupportedDecoder) DecodeInt96

func (NotSupportedDecoder) DecodeInt96([]deprecated.Int96) (int, error)

func (NotSupportedDecoder) Encoding

func (NotSupportedDecoder) Encoding() format.Encoding

func (NotSupportedDecoder) Reset

func (NotSupportedDecoder) Reset(io.Reader)

func (NotSupportedDecoder) SetBitWidth

func (NotSupportedDecoder) SetBitWidth(int)

type NotSupportedEncoder

type NotSupportedEncoder struct {
}

NotSupportedEncoder is an implementation of the Encoder interface which does not support encoding any value types.

Many parquet encodings only support encoding a subset of the parquet types, they can embed this type to default to not supporting any encoding, then override specific Encode* methods to provide implementations for the types they do support.

func (NotSupportedEncoder) EncodeBoolean

func (NotSupportedEncoder) EncodeBoolean([]bool) error

func (NotSupportedEncoder) EncodeByteArray

func (NotSupportedEncoder) EncodeByteArray(ByteArrayList) error

func (NotSupportedEncoder) EncodeDouble

func (NotSupportedEncoder) EncodeDouble([]float64) error

func (NotSupportedEncoder) EncodeFixedLenByteArray

func (NotSupportedEncoder) EncodeFixedLenByteArray(int, []byte) error

func (NotSupportedEncoder) EncodeFloat

func (NotSupportedEncoder) EncodeFloat([]float32) error

func (NotSupportedEncoder) EncodeInt16

func (NotSupportedEncoder) EncodeInt16([]int16) error

func (NotSupportedEncoder) EncodeInt32

func (NotSupportedEncoder) EncodeInt32([]int32) error

func (NotSupportedEncoder) EncodeInt64

func (NotSupportedEncoder) EncodeInt64([]int64) error

func (NotSupportedEncoder) EncodeInt8

func (NotSupportedEncoder) EncodeInt8([]int8) error

func (NotSupportedEncoder) EncodeInt96

func (NotSupportedEncoder) EncodeInt96([]deprecated.Int96) error

func (NotSupportedEncoder) Encoding

func (NotSupportedEncoder) Encoding() format.Encoding

func (NotSupportedEncoder) Reset

func (NotSupportedEncoder) Reset(io.Writer)

func (NotSupportedEncoder) SetBitWidth

func (NotSupportedEncoder) SetBitWidth(int)

Directories

Path Synopsis
Package plain implements the PLAIN parquet encoding.
Package plain implements the PLAIN parquet encoding.
Package rle implements the hybrid RLE/Bit-Packed encoding employed in repetition and definition levels, dictionary indexed data pages, and boolean values in the PLAIN encoding.
Package rle implements the hybrid RLE/Bit-Packed encoding employed in repetition and definition levels, dictionary indexed data pages, and boolean values in the PLAIN encoding.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL