lexer

package module

v0.0.0-...-85701d3 Latest Latest Go to latest Published: Mar 29, 2017 License: MIT Imports: 4 Imported by: 7

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/mh-cbon/state-lexer

Links

Open Source Insights

README ¶

state-lexer

This package provides a Lexer that functions similarly to Rob Pike's discussion about lexer design in this talk.

Original implementation forked from https://github.com/bbuck/go-lexer.

This fork remove uses of chan lexer.Token in flavor of a func (t lexer.Token) callback approach.

Usage

You can define your token types by using the lexer.TokenType type (int) via

const (
	StringToken lexer.TokenType = iota
	IntegerToken
	// etc...
)

And then you define your own state functions (lexer.StateFunc) to handle analyzing the string.

func StringState(l *lexer.L) lexer.StateFunc {
	l.Next() // eat starting "
	l.Ignore() // drop current value
	while l.Peek() != '"' {
		l.Next()
	}
	l.Emit(StringToken)

	return SomeStateFunction
}

Finally invoke a new instance of Lexer.L and call for Scan() method.

package main

import (
	"bytes"
	"fmt"
	"github.com/mh-cbon/state-lexer"
)

const (
	NumberToken lexer.TokenType = iota
	WsToken
)

func isWhitespace(ch rune) bool { return ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r' }

func NumberState(l *lexer.L) lexer.StateFunc {
	// eat whitespace
	r := l.Next()
	for isWhitespace(r) {
		r = l.Next()
		if r == lexer.EOFRune {
			l.Emit(WsToken)
			return nil // signal end of parsing
		}
	}
	l.Rewind()      // put back last read, it is not a space
	l.Emit(WsToken) // emit WsToken for what we found

	l.Take("0123456789")
	l.Emit(NumberToken)

	return NumberState // signal next state
}

func main() {
	b := bytes.NewBufferString("1 2 ")
	l := lexer.New(b, NumberState)
	l.ErrorHandler = func(e string) {}

	var tokens []lexer.Token
	l.Scan(func(tok lexer.Token) {
		tokens = append(tokens, tok)
	})

	fmt.Printf("%#v", tokens)

	//Output:
	// []lexer.Token{
	//  lexer.Token{Type:1, Value:""},
	//  lexer.Token{Type:0, Value:"1"},
	//  lexer.Token{Type:1, Value:" "},
	//  lexer.Token{Type:0, Value:"2"},
	//  lexer.Token{Type:1, Value:" "},
	// }
}

Credits

To both Rob Pike and bbuck for their work! Thanks! Writing my very first parser/lexer was really cool :p

Documentation ¶

Overview ¶

This package provides a Lexer that functions similarly to Rob Pike's discussion about lexer design in this [talk](https://www.youtube.com/watch?v=HxaD_trXwRE).

Original implementation forked from https://github.com/bbuck/go-lexer.

You can define your token types by using the `lexer.TokenType` type (`int`) via

const (
        StringToken lexer.TokenType = iota
        IntegerToken
        // etc...
)

And then you define your own state functions (`lexer.StateFunc`) to handle analyzing the string.

func StringState(l *lexer.L) lexer.StateFunc {
        l.Next() // eat starting "
        l.Ignore() // drop current value
        while l.Peek() != '"' {
                l.Next()
        }
        l.Emit(StringToken)

        return SomeStateFunction
}

Example (Lexer) ¶

b := bytes.NewBufferString("1 2 ")
l := New(b, NumberState)
l.ErrorHandler = func(e string) {}

var tokens []Token
l.Scan(func(tok Token) {
	tokens = append(tokens, tok)
})

fmt.Printf("%#v", tokens)

Output:

[]lexer.Token{lexer.Token{Type:0, Value:"1"}}

Index ¶

Constants
func Not(t TokenType, f func(Token)) func(Token)
type L
- func New(src io.Reader, start StateFunc) *L
type StateFunc
type Token
type TokenType

Examples ¶

Package (Lexer)

Constants ¶

View Source

const (
	EOFRune    rune      = -1
	EmptyToken TokenType = 0
)

Variables ¶

This section is empty.

Functions ¶

func Not ¶

func Not(t TokenType, f func(Token)) func(Token)

Not Helper function

Types ¶

type L ¶

type L struct {
	Err error
	// tokens          chan Token
	TokenHandler func(t Token)
	ErrorHandler func(e string)
	// contains filtered or unexported fields
}

func New ¶

func New(src io.Reader, start StateFunc) *L

New creates a returns a lexer ready to parse the given source code.

func (*L) Current ¶

func (l *L) Current() string

Current returns the value being analyzed at this moment.

func (*L) Emit ¶

func (l *L) Emit(t TokenType)

Emit will receive a token type and push a new token with the current analyzed value into the tokens channel.

func (*L) Error ¶

func (l *L) Error(e string)

func (*L) Ignore ¶

func (l *L) Ignore()

Ignore clears the rewind stack and then sets the current beginning position to the current position in the source which effectively ignores the section of the source being analyzed.

func (*L) Next ¶

func (l *L) Next() rune

Next pulls the next rune from the Lexer and returns it, moving the position forward in the source.

func (*L) NextToken ¶

func (l *L) NextToken() *Token

NextToken Reads until a token is met, it returns nil at EOF.

func (*L) NextTokens ¶

func (l *L) NextTokens() []*Token

NextTokens Reads until at least one token is met, it returns nil a []*Token{nil} at EOF.

func (*L) Peek ¶

func (l *L) Peek() rune

Peek performs a Next operation immediately followed by a Rewind returning the peeked rune.

func (*L) Rewind ¶

func (l *L) Rewind()

Rewind will take the last rune read (if any) and rewind back. Rewinds can occur more than once per call to Next but you can never rewind past the last point a token was emitted.

func (*L) Scan ¶

func (l *L) Scan(f func(t Token))

Scan Broweses all tokens and invokdes f for each of them.

func (*L) Take ¶

func (l *L) Take(chars string)

Take receives a string containing all acceptable strings and will contine over each consecutive character in the source until a token not in the given string is encountered. This should be used to quickly pull token parts.

type StateFunc ¶

type StateFunc func(*L) StateFunc

type Token ¶

type Token struct {
	Type  TokenType
	Value string
}

func (*Token) GetType ¶

func (t *Token) GetType() TokenType

func (*Token) GetValue ¶

func (t *Token) GetValue() string

func (*Token) String ¶

func (t *Token) String() string

type TokenType ¶

type TokenType int

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL