lexer

package

v0.0.0-...-c3230fc Latest Latest Go to latest Published: Aug 7, 2023 License: MIT Imports: 3 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/kevindamm/plugg

Documentation ¶

Index ¶

Constants
Variables
func IsComment(pos TokenPos) bool
func IsMetaBlock(pos TokenPos) bool
func IsSentence(pos TokenPos) bool
func ReadAll(reader TokenReader) error
type Cursor
- func NewCursor() Cursor
type ExprEndToken
- func (tok ExprEndToken) Image() string
- func (tok ExprEndToken) TypeString() string
type ExprStartToken
- func (tok ExprStartToken) Image() string
- func (tok ExprStartToken) TypeString() string
type KeywordToken
- func (tok KeywordToken) At(pos TokenPos) Token
- func (tok KeywordToken) Image() string
- func (tok KeywordToken) TypeString() string
type LDArrowToken
- func (tok LDArrowToken) Image() string
- func (tok LDArrowToken) TypeString() string
type QMarkToken
- func (tok QMarkToken) Image() string
- func (tok QMarkToken) TypeString() string
type SymbolToken
type Token
- func ExpressionEnd(pos TokenPos) Token
- func ExpressionStart(pos TokenPos) Token
- func Identifier(name string, pos TokenPos) Token
- func Integer(image string, pos TokenPos) Token
- func KeywordAt(image string, pos TokenPos) Token
- func LeftDoubleArrow(pos TokenPos) Token
- func LineComment(image string, pos TokenPos) Token
- func QuestionMark(pos TokenPos) Token
- func UnexpectedToken(image string, pos TokenPos) Token
- func (data Token) String() string
type TokenPos
- func NewTokenPos(line, col uint) TokenPos
- func (pos TokenPos) Column() uint
- func (pos TokenPos) InComment() TokenPos
- func (pos TokenPos) InMetaBlock() TokenPos
- func (pos TokenPos) InSentence() TokenPos
- func (pos TokenPos) Line() uint
- func (pos TokenPos) NextAt(lines, cols uint) TokenPos
- func (pos TokenPos) NextCol() TokenPos
- func (pos TokenPos) NextLine() TokenPos
- func (pos TokenPos) ResetFlag() TokenPos
- func (data TokenPos) String() string
type TokenReader
- func NewTokenReader(input io.RuneReader, output chan Token) TokenReader
type TokenType

Constants ¶

View Source

const (
	RUNE_OPEN_PAREN    = '('
	RUNE_CLOSE_PAREN   = ')'
	RUNE_QUESTION_MARK = '?'
	RUNE_COMMENT_SEMI  = ';'

	RUNE_BEGIN_ARROW_LD = '<'
	IMAGE_ARROW_LD      = "<="
)

View Source

const CURSOR_TAB_STOP = 4

Arbitrary size for \t alignment.

Variables ¶

View Source

var EOF = Token{kTOKENPOS_ZERO, &eofToken{}}

EOF token indicates the end of the token stream. As EOF is not in the document, its TokenPos is always zero.

Functions ¶

func IsComment ¶

func IsComment(pos TokenPos) bool

func IsMetaBlock ¶

func IsMetaBlock(pos TokenPos) bool

func IsSentence ¶

func IsSentence(pos TokenPos) bool

func ReadAll ¶

func ReadAll(reader TokenReader) error

Repeatedly calls `NextToken()` until either the enf of file (EOF) is reached or until an error is returned when attempting to read the next token. Unlike NextToken(), it does not forward the io.EOF error - if `EOF` is reached and no other errors are encountered, this method returns `nil`.

Types ¶

type Cursor ¶

type Cursor interface {
	// NextRune is called to extend the cursor by reading the next rune from
	// input.  Also updates the pending string except when skipping spaces, and
	// returns the updated cursor and the rune that was read.
	NextRune(input io.RuneReader) (Cursor, rune)
	// Similar to NextRune() but will read from the pending buffer if nonempty,
	// and read from input, populating pending, if pending was empty.  Implicitly
	// ignores leading spaces if needing to read from input.
	FirstRune(input io.RuneReader) (Cursor, rune)

	// Consumes all characters in the pending rune list, updating pos to match.
	ConsumeAll() (Cursor, string)
	// Same as ConsumeAll() except the last rune is left in the pending buffer.
	ConsumeExceptFinal() (Cursor, string)

	// Resets the TokenPos for this cursor to (0, 0, UNKNOWN).
	ResetPos() Cursor

	// The current position of the next Token that would be produced by consuming
	// the contents of this Cursor, whether or not anything is in pending buffer.
	Pos() TokenPos

	// Returns true if there is nothing pending in the cursor.
	IsEmpty() bool

	// Returns true if the last ReadRune call returned an error.
	HasError() bool
	// Returns `true` if the embedded error is io.EOF.
	IsEOF() bool
	// Returns the error (or nil) from the most recent read of input.  If an error
	// is encountered, it will persist through update methods and prohibit reads.
	//
	// Intentionally not extending `error` interface by naming this ErrorValue.
	ErrorValue() error
}

The Cursor represents a few properties of the lexer's state that are invariably coupled to each other -- token position, the runes ready to be integrated into the next token, and whether there is a pending rune waiting to be processed. The token's next position should always be the current position plus the size of the pending rune, if there is a pending rune, but that relies on whether scanning can be done in LL(1) or (in some cases) LL(0) as with `(` and `)`. It also smelled bad to be updating only part of the lexer state and another part that depended on it, while not doing so atomically.

This, and its backing struct, are a solution to the above problems while also aiding the readability of the token-specific lexer code. The coupled updates are done within the Advance and Consume methods, there is no redundant next pos or ambiguity about the contents of the pending image. In addition to that, the cursor is copy-on-write, all updates are conveyed by the return value of the updating method, and the implementing methods use by-value receivers so downcast-and-update has limited adverse effect.

However, it assumes that it is the only reader on the provided input, and that its scan position is consistent between calls to Advance. If there is need of multiple concurrent cursors on the same reader source, use new readers for each cursor or tee the source RunReader. Rather than further complicate this code with management of byte offsets and seeks at each read, especially while this task of tokenizing byte streams is inherently single- threaded. Calling code is expected to manage it, typically via lexerState.

func NewCursor ¶

func NewCursor() Cursor

type ExprEndToken ¶

type ExprEndToken struct{ SymbolToken }

Token EXPR_END = ")"

var EXPR_END ExprEndToken

func (ExprEndToken) Image ¶

func (tok ExprEndToken) Image() string

func (ExprEndToken) TypeString ¶

func (tok ExprEndToken) TypeString() string

type ExprStartToken ¶

type ExprStartToken struct{ SymbolToken }

Token EXPR_START = "("

var EXPR_START ExprStartToken

func (ExprStartToken) Image ¶

func (tok ExprStartToken) Image() string

func (ExprStartToken) TypeString ¶

func (tok ExprStartToken) TypeString() string

type KeywordToken ¶

type KeywordToken struct {
	// contains filtered or unexported fields
}

All keywords are given the KEYWORD token type.

func (KeywordToken) At ¶

func (tok KeywordToken) At(pos TokenPos) Token

Constructs a Token instance pointing to the singular KeywordToken instance for the specific keyword.

func (KeywordToken) Image ¶

func (tok KeywordToken) Image() string

Satisfies the requirement for TokenType interface.

func (KeywordToken) TypeString ¶

func (tok KeywordToken) TypeString() string

Satisfies the requirement for TokenType interface.

type LDArrowToken ¶

type LDArrowToken struct{ SymbolToken }

Token ARROW_LD = "<="

var ARROW_LD LDArrowToken

func (LDArrowToken) Image ¶

func (tok LDArrowToken) Image() string

func (LDArrowToken) TypeString ¶

func (tok LDArrowToken) TypeString() string

type QMarkToken ¶

type QMarkToken struct{ SymbolToken }

Token QUE_MARK = "?"

var QUE_MARK QMarkToken

func (QMarkToken) Image ¶

func (tok QMarkToken) Image() string

func (QMarkToken) TypeString ¶

func (tok QMarkToken) TypeString() string

type SymbolToken ¶

type SymbolToken struct{}

Symbol tokens always have the same image, they can share a common instance.

type Token ¶

type Token struct {
	TokenPos
	TokenType
}

Represents a Token instance by its position in the source and its type. The TokenType is an embedded interface (see above) and may be initialized with state/context or reuse a shared instance for the many tokens that are universally identical within their type (e.g. keywords, operator symbols). TokenPos is a 32-bit uint composite value defined in [token_pos.go].

func ExpressionEnd ¶

func ExpressionEnd(pos TokenPos) Token

Indicates the end of expressions and sub-expressions within a sentence.

func ExpressionStart ¶

func ExpressionStart(pos TokenPos) Token

Begins all expressions, the main structural denotation in GDL syntax.

func Identifier ¶

func Identifier(name string, pos TokenPos) Token

Identifier is a catch-all token for alpha-num strings that are not keywords.

func Integer ¶

func Integer(image string, pos TokenPos) Token

More complex numeric types can be constructed from sequences of unsigned integers and punctuation. This also keeps the tokenizer state management simpler by defining negatives, floats, etc. in terms of production rule semantics. GDL and GDL-II both only assume integer constants in [0-100].

func KeywordAt ¶

func KeywordAt(image string, pos TokenPos) Token

func LeftDoubleArrow ¶

func LeftDoubleArrow(pos TokenPos) Token

Used in constructing relations.

func LineComment ¶

func LineComment(image string, pos TokenPos) Token

Line comments are any sequence of characters beginning with a semicolon and extending until the next newline rune '\n'.

func QuestionMark ¶

func QuestionMark(pos TokenPos) Token

Used in the production rule for Variable terms.

func UnexpectedToken ¶

func UnexpectedToken(image string, pos TokenPos) Token

An unexpected token is used when a parse error is encountered, despite there being no read errors encountered (those are returned with the NextToken call). An example would be incomplete Unicode bytes or a string without end quotes. Illegal tokens retain the image of the scan up to and including the bad char.

func (Token) String ¶

func (data Token) String() string

General implementation of the string conversion (i.e. for fmt interpolation). More specific Token types may override this String() function but the only operation that should make use of it is logging, for debugging & testing.

type TokenPos ¶

type TokenPos uint32

TokenPos encoded as 32-bit uint:

.LLLLLLLLLLLLLLLLLLLLCCCCCCCCCCFF. :[++++++++++++++++++] : : 20 bits LINE : : [++++++++] : : 10 bits COLUMN : : []: : 2 bits FLAGS: : : `10987654321098765432109876543210'

Use Line(), Column() and Next*() methods to read and update values.

func NewTokenPos ¶

func NewTokenPos(line, col uint) TokenPos

func (TokenPos) Column ¶

func (pos TokenPos) Column() uint

Returns the 1-indexed column number of the position, zero means unknwon. Token embeds this from TokenPos interface to adopt its Column() method.

func (TokenPos) InComment ¶

func (pos TokenPos) InComment() TokenPos

Produces the same Token position, ensuring its flag is set to COMMENT mode.

func (TokenPos) InMetaBlock ¶

func (pos TokenPos) InMetaBlock() TokenPos

Produces the same Token position, ensuring its flag is set to COMMENT mode.

func (TokenPos) InSentence ¶

func (pos TokenPos) InSentence() TokenPos

Produces the same Token position, ensuring its flag is set to SENTENCE mode.

func (TokenPos) Line ¶

func (pos TokenPos) Line() uint

Returns the 1-indexed line number of the position, zero means unknwon. Token embeds this from TokenPos interface to adopt ts Line() method.

func (TokenPos) NextAt ¶

func (pos TokenPos) NextAt(lines, cols uint) TokenPos

Increments by number of lines then by number of columns.

func (TokenPos) NextCol ¶

func (pos TokenPos) NextCol() TokenPos

Increments the column, keeping the current flag.

func (TokenPos) NextLine ¶

func (pos TokenPos) NextLine() TokenPos

Increments the position to its next line, resetting the column as well. Flag's current value is reset from comment mode, retained otherwise.

func (TokenPos) ResetFlag ¶

func (pos TokenPos) ResetFlag() TokenPos

Resets the flag value to unknown.

func (TokenPos) String ¶

func (data TokenPos) String() string

String conversion for the TokenPos value. As a uint there was alerady a conversion available but the integer value obscures the actual position data.

type TokenReader ¶

type TokenReader interface {
	// Reads the next token, sending it to output, returning error or nil.  If an
	// io.EOF error was encountered it is returned here as well.
	NextToken() error

	// Read/Receive-only channel for Token values sent as being read from the input.
	// Calling NextToken() or ReadAll() will produce tokens on this channel and one
	// of those methods will close the channel when it encounters EOF. An EOF token
	// is also produced as the last token on the channel, so consumers can listen
	// for it specifically or listen until channel close using `for ... := range`.
	TokenReceiver() <-chan Token
}

Public interface for reading a stream of tokens, sending them to a channel. See also ReadAll(reader) which provides a simpler interface for full reads.

func NewTokenReader ¶

func NewTokenReader(input io.RuneReader, output chan Token) TokenReader

Constructor function for a lexer-based token reader.

type TokenType ¶

type TokenType interface {
	// Returns a string representation of the type of this token.
	TypeString() string
	// Returns a string representation of this token, its syntactic image.
	Image() string
}

TokenType intrinsically defines the subtype of a Token and provides identifying methods.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL