lexer

package module
v0.0.0-...-85701d3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2017 License: MIT Imports: 4 Imported by: 7

README

state-lexer

This package provides a Lexer that functions similarly to Rob Pike's discussion about lexer design in this talk.

Original implementation forked from https://github.com/bbuck/go-lexer.

This fork remove uses of chan lexer.Token in flavor of a func (t lexer.Token) callback approach.

Usage

You can define your token types by using the lexer.TokenType type (int) via

const (
	StringToken lexer.TokenType = iota
	IntegerToken
	// etc...
)

And then you define your own state functions (lexer.StateFunc) to handle analyzing the string.

func StringState(l *lexer.L) lexer.StateFunc {
	l.Next() // eat starting "
	l.Ignore() // drop current value
	while l.Peek() != '"' {
		l.Next()
	}
	l.Emit(StringToken)

	return SomeStateFunction
}

Finally invoke a new instance of Lexer.L and call for Scan() method.

package main

import (
	"bytes"
	"fmt"
	"github.com/mh-cbon/state-lexer"
)

const (
	NumberToken lexer.TokenType = iota
	WsToken
)

func isWhitespace(ch rune) bool { return ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r' }

func NumberState(l *lexer.L) lexer.StateFunc {
	// eat whitespace
	r := l.Next()
	for isWhitespace(r) {
		r = l.Next()
		if r == lexer.EOFRune {
			l.Emit(WsToken)
			return nil // signal end of parsing
		}
	}
	l.Rewind()      // put back last read, it is not a space
	l.Emit(WsToken) // emit WsToken for what we found

	l.Take("0123456789")
	l.Emit(NumberToken)

	return NumberState // signal next state
}

func main() {
	b := bytes.NewBufferString("1 2 ")
	l := lexer.New(b, NumberState)
	l.ErrorHandler = func(e string) {}

	var tokens []lexer.Token
	l.Scan(func(tok lexer.Token) {
		tokens = append(tokens, tok)
	})

	fmt.Printf("%#v", tokens)

	//Output:
	// []lexer.Token{
	//  lexer.Token{Type:1, Value:""},
	//  lexer.Token{Type:0, Value:"1"},
	//  lexer.Token{Type:1, Value:" "},
	//  lexer.Token{Type:0, Value:"2"},
	//  lexer.Token{Type:1, Value:" "},
	// }
}

Credits

To both Rob Pike and bbuck for their work! Thanks! Writing my very first parser/lexer was really cool :p

Documentation

Overview

This package provides a Lexer that functions similarly to Rob Pike's discussion about lexer design in this [talk](https://www.youtube.com/watch?v=HxaD_trXwRE).

Original implementation forked from https://github.com/bbuck/go-lexer.

You can define your token types by using the `lexer.TokenType` type (`int`) via

const (
        StringToken lexer.TokenType = iota
        IntegerToken
        // etc...
)

And then you define your own state functions (`lexer.StateFunc`) to handle analyzing the string.

func StringState(l *lexer.L) lexer.StateFunc {
        l.Next() // eat starting "
        l.Ignore() // drop current value
        while l.Peek() != '"' {
                l.Next()
        }
        l.Emit(StringToken)

        return SomeStateFunction
}
Example (Lexer)
b := bytes.NewBufferString("1 2 ")
l := New(b, NumberState)
l.ErrorHandler = func(e string) {}

var tokens []Token
l.Scan(func(tok Token) {
	tokens = append(tokens, tok)
})

fmt.Printf("%#v", tokens)
Output:

[]lexer.Token{lexer.Token{Type:0, Value:"1"}}

Index

Examples

Constants

View Source
const (
	EOFRune    rune      = -1
	EmptyToken TokenType = 0
)

Variables

This section is empty.

Functions

func Not

func Not(t TokenType, f func(Token)) func(Token)

Not Helper function

Types

type L

type L struct {
	Err error
	// tokens          chan Token
	TokenHandler func(t Token)
	ErrorHandler func(e string)
	// contains filtered or unexported fields
}

func New

func New(src io.Reader, start StateFunc) *L

New creates a returns a lexer ready to parse the given source code.

func (*L) Current

func (l *L) Current() string

Current returns the value being analyzed at this moment.

func (*L) Emit

func (l *L) Emit(t TokenType)

Emit will receive a token type and push a new token with the current analyzed value into the tokens channel.

func (*L) Error

func (l *L) Error(e string)

func (*L) Ignore

func (l *L) Ignore()

Ignore clears the rewind stack and then sets the current beginning position to the current position in the source which effectively ignores the section of the source being analyzed.

func (*L) Next

func (l *L) Next() rune

Next pulls the next rune from the Lexer and returns it, moving the position forward in the source.

func (*L) NextToken

func (l *L) NextToken() *Token

NextToken Reads until a token is met, it returns nil at EOF.

func (*L) NextTokens

func (l *L) NextTokens() []*Token

NextTokens Reads until at least one token is met, it returns nil a []*Token{nil} at EOF.

func (*L) Peek

func (l *L) Peek() rune

Peek performs a Next operation immediately followed by a Rewind returning the peeked rune.

func (*L) ReadBytes

func (l *L) ReadBytes() int

ReadBytes returns number of byte reead.

func (*L) Rewind

func (l *L) Rewind()

Rewind will take the last rune read (if any) and rewind back. Rewinds can occur more than once per call to Next but you can never rewind past the last point a token was emitted.

func (*L) Scan

func (l *L) Scan(f func(t Token))

Scan Broweses all tokens and invokdes f for each of them.

func (*L) Take

func (l *L) Take(chars string)

Take receives a string containing all acceptable strings and will contine over each consecutive character in the source until a token not in the given string is encountered. This should be used to quickly pull token parts.

type StateFunc

type StateFunc func(*L) StateFunc

type Token

type Token struct {
	Type  TokenType
	Value string
}

func (*Token) GetType

func (t *Token) GetType() TokenType

func (*Token) GetValue

func (t *Token) GetValue() string

func (*Token) String

func (t *Token) String() string

type TokenType

type TokenType int

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL