parser

package
v0.0.14 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 28, 2024 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Implements a PDF object parser, mapping a list of tokens (see the tokenizer package) into tree-like structure. Higher-level reader is neeed to decrypt a full PDF file.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ParseContent

func ParseContent(content []byte, res model.ResourcesColorSpace) ([]cs.Operation, error)

ParseContent parse a decrypted Content Stream. A resource dictionary is needed to handle inline image data, which can refer to a color space.

func ParseContentResources

func ParseContentResources(content []byte, res model.ResourcesColorSpace) (model.ResourcesDict, error)

ParseContentResources return the resources needed by content. Note that only the names in the returned dicts are valid, all the values will be nil.

func ParseDirectFilters

func ParseDirectFilters(filters, decodeParams Object) (model.Filters, error)

ParseDirectFiltersis the same as ParseFilters, but for direct objects. It is the case in image inline parameters and xRefStream dicts.

func ParseFilters

func ParseFilters(filters, decodeParams Object, resolver func(Object) (Object, error)) (model.Filters, error)

ParseFilters process the given filters and their (optionnal) parameters. `resolver` is called to resolve the potential indirect objects An empty list may be returned if the filters are nil.

Types

type Array

type Array = model.ObjArray

type Bool

type Bool = model.ObjBool

type Command

type Command = model.ObjCommand

type Dict

type Dict = model.ObjDict

type Fl

type Fl = model.Fl

type Float

type Float = model.ObjFloat

type HexLiteral

type HexLiteral = model.ObjHexLiteral

type IndirectRef

type IndirectRef = model.ObjIndirectRef

type Integer

type Integer = model.ObjInt

type Name

type Name = model.Name

type Object

type Object = model.Object

func ParseObject

func ParseObject(data []byte) (Object, error)

ParseObject tokenizes and parses the input, expecting a valid PDF object.

func ParseObjectDefinition

func ParseObjectDefinition(line []byte, headerOnly bool) (objectNumber int, generationNumber int, o Object, err error)

ParseObjectDefinition parses an object definition. If `headerOnly`, stops after the X X obj header and return a nil object.

type Parser

type Parser struct {

	// If true, disallow Indirect Reference,
	// but allow Commands
	ContentStreamMode bool
	// contains filtered or unexported fields
}

Standalone implementation of a PDF parser. The parser only handles chunks of PDF files (corresponding for example to object definitions), but cannot handle a full file with streams. An higher-level reader is needed to decode Streams and Inline Data, which require knowledge on the filters used.

func NewParser

func NewParser(data []byte) *Parser

NewParser uses a byte slice as input.

func NewParserFromTokenizer

func NewParserFromTokenizer(tokens *tkn.Tokenizer) *Parser

NewParserFromTokenizer use a tokenizer as input.

func (*Parser) ParseContentElement

func (pr *Parser) ParseContentElement(res model.ResourcesColorSpace) (cs.Operation, error)

ParseContentElement parse one operation and avances. `ContentStreamMode` must have been set to true, and EOF should be checked before calling with method. See `ParseContent` for a convenient way of parsing a whole content stream.

func (*Parser) ParseObject

func (p *Parser) ParseObject() (Object, error)

ParseObject read one of the (potentially) many objects in the input data (See NewParser).

type StringLiteral

type StringLiteral = model.ObjStringLiteral

Directories

Path Synopsis
Package filters provide logic to handle binary data encoded with PDF filters, such as inline data images.
Package filters provide logic to handle binary data encoded with PDF filters, such as inline data images.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL