grammar

package module

v0.0.0-...-fff725e Latest Latest Go to latest Published: Oct 30, 2021 License: Apache-2.0 Imports: 8 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/arnodel/grammar

Links

Open Source Insights

README ¶

grammar

A parser generator where rules defined as go structs and code generation is optional. The concepts are introduced in the simple example below.

Rules

Rules can be written as Go structs. Here is a definition for s-exprs:


// SExpr ::= Number | String | Atom | List
type SExpr struct {
    grammar.OneOf                // This rule is a "one-of": one of the rules below has to match
    Number *Token `tok:"number"` // A pointer field means an optional match
    String *Token `tok:"string"` // This "tok" tag means only tokens of type "string" will match
    Atom   *Token `tok:"atom"`
    *List                        // All field in a one-of rule must be optional
}

// List ::= "(" [Item] ")"
type List struct {
    grammar.Seq             // This is a Sequence rule, all fields must match in a sequence
    OpenBkt  Token `tok:"bkt,("`  // This "tok" tag means only "bkt" tokens with value "(" will match
    Items []SExpr
    CloseBkt Token `tok:"bkt,)`
}

Tokens

This is not quite complete as you need a Token type. You can create your own or if your needs are simple enough use grammar.SimpleToken:

type Token = grammar.SimpleToken

In order to parse your s-exprs into the data structures aboves you also need a tokeniser. You can make your own tokeniser or you can build one simply with the grammar.SimpleTokeniser function:

var tokenise = grammar.SimpleTokeniser([]grammar.TokenDef{
    {
        // If Name is empty, the token is skipped in the token stream
        Ptn: `\s+` // This is a regular expression (must not contain groups)
    },
    {
        Name: "bkt", // This is the type of the token seen in the "tok" struct tag
        Ptn: `[()]`,
    },
    {
        Name: "string",
        Ptn: `"[^"]*"`,
    },
    {
        Name: "number",
        Ptn: `-?[0-9]+(?:\.[0-9]+)?`,
    },
    {
        Name: "atom",
        Ptn: `[a-zA-Z_][a-zA-Z0-9_-]*`,
    },
})

Parsing

Now putting all this together you can parse an s-expr of your choice:


tokenStream, _ := tokenise(`(cons a (list 123 "c")))`)
var sexpr SExpr
err := grammar.Parse(&sexpr, tokenStream)

Now sexpr's fields have been filled and you explore the syntax tree by traversing the fields in sexpr, e.g. sexpr.List.Items is now a slice of SExprs, sexpr.List.Items[0].Atom is a Token with Value "cons" (and type atom).

There is a convenient function to output a rule struct:

grammar.PrettyWrite(sexpr, os.Stdout)

This will output a pretty representation of sexpr:

SExpr {
  List: List {
    OpenBkt: {bkt (}
    Items: [
      SExpr {
        Atom: {atom cons}
      }
      SExpr {
        Atom: {atom a}
      }
      SExpr {
        List: List {
          OpenBkt: {bkt (}
          Items: [
            SExpr {
              Atom: {atom list}
            }
            SExpr {
              Number: {number 123}
            }
            SExpr {
              String: {string "c"}
            }
          ]
          CloseBkt: {bkt )}
        }
      }
    ]
    CloseBkt: {bkt )}
  }
}

Generating a parser

The above works using reflection, which is fine but can be a little slow if you are parsing very big files. It is also possible to compile a parser.

First you need the command that generates the parser.

go install github.com/arnodel/grammar/cmd/genparse

Then the simplest is to add this line to the file containing the grammar you want to compile:

//go:generate genparse

You can now generate the compiled parser by running go generate in your package. If your file is called say grammar.go, it will generated a file in the same package called grammar.compiled.go. You can still parse files the same way as before using grammar.Parse() but this will no longer use reflection! Note that you can always force reflection to be used by compiling your program with the nocompiledgrammar go compiler tag.

Documentation ¶

Overview ¶

Example ¶

package main

import (
	"os"

	"github.com/arnodel/grammar"
)

type Token = grammar.SimpleToken

type SExpr struct {
	grammar.OneOf
	Number *Token `tok:"number"`
	String *Token `tok:"string"`
	Atom   *Token `tok:"atom"`
	*List
}

type List struct {
	grammar.Seq
	OpenBkt  Token `tok:"bkt,("`
	Items    []SExpr
	CloseBkt Token `tok:"bkt,)"`
}

var tokenise = grammar.SimpleTokeniser([]grammar.TokenDef{
	{
		// If Name is empty, the token is skipped in the token stream
		Ptn: `\s+`,
	},
	{
		Name: "bkt",
		Ptn:  `[()]`,
	},
	{
		Name: "string",
		Ptn:  `"[^"]*"`,
	},
	{
		Name: "number",
		Ptn:  `-?[0-9]+(?:\.[0-9]+)?`,
	},
	{
		Name: "atom",
		Ptn:  `[a-zA-Z_][a-zA-Z0-9_-]*`,
	},
})

func main() {
	tokenStream, _ := tokenise(`(cons a (list 123 "c")))`)
	var sexpr SExpr
	grammar.Parse(&sexpr, tokenStream)
	grammar.PrettyWrite(os.Stdout, sexpr)

}

Output:

SExpr {
  List: List {
    OpenBkt: {bkt (}
    Items: [
      SExpr {
        Atom: {atom cons}
      }
      SExpr {
        Atom: {atom a}
      }
      SExpr {
        List: List {
          OpenBkt: {bkt (}
          Items: [
            SExpr {
              Atom: {atom list}
            }
            SExpr {
              Number: {number 123}
            }
            SExpr {
              String: {string "c"}
            }
          ]
          CloseBkt: {bkt )}
        }
      }
    ]
    CloseBkt: {bkt )}
  }
}

Index ¶

Variables
func PrettyWrite(out io.Writer, r interface{}) error
func SimpleTokeniser(tokenDefs []TokenDef) func(string) (*SimpleTokenStream, error)
type Empty
- func (Empty) Parse(r interface{}, s *ParserState, opts ParseOptions) *ParseError
type FieldType
type Mode
type OneOf
- func (OneOf) Parse(r interface{}, s *ParserState, opts ParseOptions) *ParseError
type ParseError
- func Parse(dest interface{}, s TokenStream, opts ...ParseOption) *ParseError
- func ParseWithOptions(dest interface{}, s *ParserState, opts ParseOptions) *ParseError
- func (e *ParseError) Error() string
- func (e *ParseError) Merge(e2 *ParseError) *ParseError
type ParseOption
- func WithLogger(l *log.Logger) ParseOption
type ParseOptions
- func (o ParseOptions) MatchToken(tok Token) (bool, bool)
type Parser
type ParserState
- func (s *ParserState) Debug() bool
- func (s *ParserState) Logf(fstr string, args ...interface{})
- func (s *ParserState) MergeError(err *ParseError) *ParseError
type RuleDef
type RuleField
type Seq
- func (Seq) Parse(r interface{}, s *ParserState, opts ParseOptions) *ParseError
type SimpleToken
- func (t *SimpleToken) Parse(_ interface{}, s *ParserState, opts ParseOptions) *ParseError
- func (t SimpleToken) Type() string
- func (t SimpleToken) Value() string
type SimpleTokenStream
- func NewSimpleTokenStream(toks []Token) *SimpleTokenStream
- func (s *SimpleTokenStream) Dump(w io.Writer)
- func (s *SimpleTokenStream) Next() Token
- func (s *SimpleTokenStream) Restore(pos int)
- func (s *SimpleTokenStream) Save() int
type SizeOptions
type Token
type TokenDef
type TokenParseOptions
- func (o TokenParseOptions) String() string
type TokenStream

Examples ¶

Package

Constants ¶

This section is empty.

Variables ¶

View Source

var EOF = SimpleToken{
	TokType:  "EOF",
	TokValue: "EOF",
}

EOF is the token that is returned when trying to consument an exhausted token stream.

View Source

var WithDefaultLogger = WithLogger(log.Default())

Functions ¶

func PrettyWrite ¶

func PrettyWrite(out io.Writer, r interface{}) error

PrettyWrite outputs a pretty representation of the rule r on the writer. For a struct like this:

type RuleType struct {
    Field1 Rule1
    Field2 []Rule2
}

It looks like this:

RuleType {
  Field1: <pretty representation of the value of Field1>
  Field2: [
     <pretty representation of the first item in Field2>
     <pretty representation of the second item>
     <...>
  ]
}

Empty optional fields and empty repeated fields are omitted altogether.

func SimpleTokeniser ¶

func SimpleTokeniser(tokenDefs []TokenDef) func(string) (*SimpleTokenStream, error)

SimpleTokeniser takes a list of TokenDefs and returns a function that can tokenise a string. Designed for simple use-cases.

Types ¶

type Empty ¶

type Empty struct{}

func (Empty) Parse ¶

func (Empty) Parse(r interface{}, s *ParserState, opts ParseOptions) *ParseError

type FieldType ¶

type FieldType struct {
	BaseType reflect.Type
	Pointer  bool
	Array    bool
}

type Mode ¶

type Mode struct {
}

type OneOf ¶

type OneOf struct{}

OneOf should be used as the first field of a Rule struct to signify that it should match exactly one of the fields

func (OneOf) Parse ¶

func (OneOf) Parse(r interface{}, s *ParserState, opts ParseOptions) *ParseError

type ParseError ¶

type ParseError struct {
	Err error
	Token
	TokenParseOptions []TokenParseOptions
	Pos               int
}

func Parse ¶

func Parse(dest interface{}, s TokenStream, opts ...ParseOption) *ParseError

Parse tries to interpret dest as a grammar rule and use it to parse the given token stream. Parse can panic if dest is not a valid grammar rule. It returns a non-nil *ParseError if the token stream does not match the rule.

func ParseWithOptions ¶

func ParseWithOptions(dest interface{}, s *ParserState, opts ParseOptions) *ParseError

ParseWithOptions is the same as Parse but the ParseOptions are explicitely given (this is mostly used by the parser generator).

func (*ParseError) Error ¶

func (e *ParseError) Error() string

func (*ParseError) Merge ¶

func (e *ParseError) Merge(e2 *ParseError) *ParseError

type ParseOption ¶

type ParseOption func(s *ParserState)

func WithLogger ¶

func WithLogger(l *log.Logger) ParseOption

type ParseOptions ¶

type ParseOptions struct {
	TokenParseOptions []TokenParseOptions
}

func (ParseOptions) MatchToken ¶

func (o ParseOptions) MatchToken(tok Token) (bool, bool)

type Parser ¶

type Parser interface {
	Parse(r interface{}, s *ParserState, opts ParseOptions) *ParseError
}

A Parser can parse a token stream to populate itself. It must return an error if it fails to do it.

type ParserState ¶

type ParserState struct {
	TokenStream
	// contains filtered or unexported fields
}

func (*ParserState) Debug ¶

func (s *ParserState) Debug() bool

func (*ParserState) Logf ¶

func (s *ParserState) Logf(fstr string, args ...interface{})

func (*ParserState) MergeError ¶

func (s *ParserState) MergeError(err *ParseError) *ParseError

type RuleDef ¶

type RuleDef struct {
	Name      string
	OneOf     bool
	Separator *RuleField
	Fields    []RuleField
}

type RuleField ¶

type RuleField struct {
	FieldType
	ParseOptions
	SizeOptions
	Name  string
	Index int
}

type Seq ¶

type Seq struct{}

Seq should be used as the first field of a Rule struct to signify that it should match all the fields in sequence.

func (Seq) Parse ¶

func (Seq) Parse(r interface{}, s *ParserState, opts ParseOptions) *ParseError

type SimpleToken ¶

type SimpleToken struct {
	TokType  string
	TokValue string
}

SimpleToken is a simple implementation of the both the Token and the Parser interfaces. So it can be used in rules to match a token field. It is also the concrete type of Tokens returned by SimpleTokenStream.

func (*SimpleToken) Parse ¶

func (t *SimpleToken) Parse(_ interface{}, s *ParserState, opts ParseOptions) *ParseError

Parse tries to match the next token in the given TokenStream with the ParseOptions, using opts.MatchToken. If they match, the receiver is loaded with the token data, if not a non-nil *ParseError is returned. In any event the next token in the token stream has been consumed.

func (SimpleToken) Type ¶

func (t SimpleToken) Type() string

Type returns the token type.

func (SimpleToken) Value ¶

func (t SimpleToken) Value() string

Value returns the value of the token.

type SimpleTokenStream ¶

type SimpleTokenStream struct {
	// contains filtered or unexported fields
}

SimpleTokenStream is a very simple implementation of the TokenStream interface which the Parse function requires.

func NewSimpleTokenStream ¶

func NewSimpleTokenStream(toks []Token) *SimpleTokenStream

func (*SimpleTokenStream) Dump ¶

func (s *SimpleTokenStream) Dump(w io.Writer)

func (*SimpleTokenStream) Next ¶

func (s *SimpleTokenStream) Next() Token

Next consumes the next token in the token stream and returns it. If the stream is exhausted, EOF is returned.

func (*SimpleTokenStream) Restore ¶

func (s *SimpleTokenStream) Restore(pos int)

Restore rewinds the token stream to the given position (which should have been obtained by s.Save()). In general this may panic - in this implementation it's always possible to rewind.

func (*SimpleTokenStream) Save ¶

func (s *SimpleTokenStream) Save() int

Save returns the current position in the token stream.

type SizeOptions ¶

type SizeOptions struct {
	Min, Max int
}

type Token ¶

type Token interface {
	Type() string  // The type of the token (e.g. operator, number, string, identifier)
	Value() string // The value of the token(e.g. `+`, `123`, `"abc"`, `if`)
}

A Token has a Type() and Value() method. A TokenStream's Next() method must return a Token.

type TokenDef ¶

type TokenDef struct {
	Ptn      string              // The regular expression the token should match
	Name     string              // The name given to this token type
	Special  func(string) string // If defined, it takes over the tokenising for this pattern
	Mode     string
	PushMode string
	PopMode  bool
}

A TokenDef defines a type of token and the pattern that matches it. Used by SimpleTokeniser to create tokenisers simply.

type TokenParseOptions ¶

type TokenParseOptions struct {
	TokenType    string
	TokenValue   string
	DoNotConsume bool
}

func (TokenParseOptions) String ¶

func (o TokenParseOptions) String() string

type TokenStream ¶

type TokenStream interface {
	Next() Token // Advance the stream by one token and return it.
	Save() int   // Return the current position in the token stream.
	Restore(int) // Return the stream to the given position.
}

A TokenStream represents a sequence of tokens that will be consumed by the Parse() function. The SimpleTokenStream implementation in this package can be used, or users can provide their own implementation (e.g. with a richer data type for tokens, or "on demand" tokenisation).

The Restore() method may panic if is not possible to return to the given position.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
genparse
json

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL