regexl

package module

v0.1.0 Latest Latest Go to latest Published: Mar 7, 2024 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/bloeys/regexl

Links

Open Source Insights

README ¶

Regexl

Regexl is a high level language for regex that can be used in any project as a simple library.

You can read about the reasoning for creating Regexl here.

Table of contents:

Regexl

Playground

There is a (WASM based) playground where you can play with Regexl here.

Regexl Query Examples

/friend/i is equivalent to the regexl:

select 'friend'

/^friend/i is equivalent to the regexl:

// This is a regexl comment.
// This set_options configuration is equivalent to: '/i'
set_options({
    case_sensitive: false,
})

select starts_with('friend')

/Hello*/g is equivalent to the regexl:

set_options({
    find_all_matches: true,
})

//-- This '--' is to help the syntax highlighter :)
//-- The '+' performs a simple concatenation, as all functions return strings
select 'Hell' + zero_plus_of('o')

/^Golang$/ is equivalent to the regexl:

set_options({
    case_sensitive: false,
})
//-- Functions can be nested, as outputs are strings.
//-- Alternative regexl: select starts_and_ends_with('Golang')
select ends_with(starts_with('Golang'))

/[abcd]/ig (match any of these 4 letters) is equivalent to the regexl:

set_options({
    find_all_matches: true,
    case_sensitive: false,
})
//-- Can also be: select any_chars_of('abcd')
select any_chars_of('abc', 'd')

/[A-Z0-9]/ig (match letters and numbers only) is equivalent to the regexl:

set_options({
    find_all_matches: true,
    case_sensitive: false,
})
//-- Can also be: select any_chars_of('abcd')
select any_chars_of(from_to('A', 'Z'), from_to(0, 9))

/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,10}/i (a 'simple' email regex) is equivalent to the regexl:

set_options({
    case_sensitive: false,
})
select
    //-- Converts to: [A-Z0-9._%+-]+
    one_plus_of(
        any_chars_of(from_to('A', 'Z'), from_to(0, 9), '._%+-')
    ) +
    //-- Converts to: @
    '@' +
    //-- Converts to: [A-Z0-9.-]+
    one_plus_of(
        any_chars_of(from_to('A', 'Z'), from_to(0, 9), '.-')
    ) +
    //-- Converts to: \.
    '.' +
    //-- Converts to: [A-Z]{2,10}
    count_between(
        any_chars_of(from_to('A', 'Z')),
        2,
        10
    )

Usage in Go

package main

import (
	"fmt"

	"github.com/bloeys/regexl"
)

func main() {

	regexlQuery := `
		set_options({
			find_all_matches: true,
			case_sensitive: false,
		})

		select starts_with('Hello there, ') + one_plus_of(any_chars_of(from_to('A', 'Z'), '.!-'))
	`

	rl := regexl.NewRegexl(regexlQuery)
	hasMatch := rl.MustCompile().CompiledRegexp.MatchString("Hello there, friend!")

	fmt.Printf("Produced regex: %s\nHas match: %v\n", rl.CompiledRegexp.String(), hasMatch)
}

Technical Details

The Regexl code is that of a very simple compiler, where the general steps involved are:

Input query text is tokenized (implemented by parser.go)
Tokens are used to create an Abstract Syntax Tree (AST) (implemented by ast.go)
The AST is fed into a 'backend' that outputs a specific regex string (e.g. Go regex) (implemented by regex_go_backend.go)

To explain the above, lets look at how the following query is compiled:

select starts_with('hello')

By tokenization we mean turning the input string into higher level segments, where each segment is split by some separator like a space, a bracket, and so on. In the above query you will get the following tokens:

Token value: select; Type: keyword
Token value: starts_with; Type: function name
Token value: (; Type: open bracket
Token value: hello; Type: string
Token value: ); Type: close bracket

With this list of tokens, an AST is created. An Abstract Syntax Tree represents the structure of a program as a tree, where the parent nodes have a dependency on the children nodes. For example, if function A calls B, then this function call node becomes a child of A, and the arguments of this call are children of the function call node.

In our query, the linear tokens list produces this AST tree:

|-- select
|   |-- starts_with
|   |   |-- hello

With the AST in place, we can traverse the tree and generate some output. In normal programming languages (e.g. C, Go, Python, etc...) the final output would be machine code, assembly, or perhaps byte code to be interpreted.

In Regexl, the output is some specific regex like Go-compatible regex, python-compatible regex, and so on (regex syntax and features differ between implementations).

The Go regex produced for our example Regexl query is:

(?i)^hello

Equivalent to the more common regex expression:

/^hello/i

The nice thing about this setup is that to support a new regex implementation all one has to do is implement a new backend (step 3), while tokenization and AST generation are reused as-is.

Todo

Become feature complete with Go regex
Better error messages
More test cases

Documentation ¶

Index ¶

Constants
Variables
type Ast
- func NewAst(tokens []Token) *Ast
- func (a *Ast) Gen() error
- func (a *Ast) GetToken(index int) *Token
- func (a *Ast) PrintTree()
type AstError
- func (te *AstError) Error() string
type BinaryExpr
- func (e *BinaryExpr) EndPos() TokenPos
- func (e *BinaryExpr) StartPos() TokenPos
type Expr
type FuncExpr
- func (e *FuncExpr) EndPos() TokenPos
- func (e *FuncExpr) StartPos() TokenPos
type GoBackend
- func (gb *GoBackend) ApplyOptionsToRegexString(regexString string) string
- func (gb *GoBackend) AstToGoRegex(ast *Ast) (*regexp.Regexp, string, error)
type IdentExpr
- func (e *IdentExpr) EndPos() TokenPos
- func (e *IdentExpr) StartPos() TokenPos
type KeyValExpr
- func (e *KeyValExpr) EndPos() TokenPos
- func (e *KeyValExpr) StartPos() TokenPos
type LiteralExpr
- func (e *LiteralExpr) EndPos() TokenPos
- func (e *LiteralExpr) StartPos() TokenPos
type Node
type ObjectLiteralExpr
- func (e *ObjectLiteralExpr) EndPos() TokenPos
- func (e *ObjectLiteralExpr) StartPos() TokenPos
type Parser
- func NewParser(query string) *Parser
- func (p *Parser) GetNextRuneByByteIndex(index int) (rune, error)
- func (p *Parser) GetRuneByByteIndex(index int) (rune, error)
- func (p *Parser) Tokenize() (tokens []Token, err error)
- func (p *Parser) ValidateTokens(tokens []Token) error
type ParserError
- func (te *ParserError) Error() string
type RegexOptions
type Regexl
- func NewRegexl(query string) *Regexl
- func (rl *Regexl) Compile() error
- func (rl *Regexl) MustCompile() *Regexl
type SelectStmt
- func (s *SelectStmt) EndPos() TokenPos
- func (s *SelectStmt) StartPos() TokenPos
type Stmt
type Token
- func (t *Token) HasLoc() bool
- func (t *Token) IsEmpty() bool
- func (t *Token) MakeEmpty()
type TokenPos
type TokenType
- func (tt TokenType) MarshalText() (text []byte, err error)
- func (i TokenType) String() string

Constants ¶

View Source

const (
	AST_INVALID_INDEX = -1
)

Variables ¶

View Source

var (
	PrintTokens  bool
	PrintAstJson bool
	PrintAstTree bool
)

@TODO: remove or make something nicer Debug options

Functions ¶

This section is empty.

Types ¶

type Ast ¶

type Ast struct {
	Tokens []Token
	Nodes  []Node
}

func NewAst ¶

func NewAst(tokens []Token) *Ast

func (*Ast) Gen ¶

func (a *Ast) Gen() error

func (*Ast) GetToken ¶

func (a *Ast) GetToken(index int) *Token

func (*Ast) PrintTree ¶

func (a *Ast) PrintTree()

type AstError ¶

type AstError struct {
	Err error
	Pos TokenPos
}

func (*AstError) Error ¶

func (te *AstError) Error() string

type BinaryExpr ¶

type BinaryExpr struct {
	Pos  TokenPos
	Type TokenType
	Lhs  Expr
	Rhs  Expr
}

func (*BinaryExpr) EndPos ¶

func (e *BinaryExpr) EndPos() TokenPos

func (*BinaryExpr) StartPos ¶

func (e *BinaryExpr) StartPos() TokenPos

type Expr ¶

type Expr interface {
	Node
	// contains filtered or unexported methods
}

type FuncExpr ¶

type FuncExpr struct {
	Pos             TokenPos
	Ident           IdentExpr
	Args            []Expr
	OpenBracketPos  TokenPos
	CloseBracketPos TokenPos
}

func (*FuncExpr) EndPos ¶

func (e *FuncExpr) EndPos() TokenPos

func (*FuncExpr) StartPos ¶

func (e *FuncExpr) StartPos() TokenPos

type GoBackend ¶

type GoBackend struct {
	Opts RegexOptions
}

GoBackend produces valid Go regex strings, based on the rules here: https://pkg.golang.ir/regexp/syntax

func (*GoBackend) ApplyOptionsToRegexString ¶

func (gb *GoBackend) ApplyOptionsToRegexString(regexString string) string

func (*GoBackend) AstToGoRegex ¶

func (gb *GoBackend) AstToGoRegex(ast *Ast) (*regexp.Regexp, string, error)

type IdentExpr ¶

type IdentExpr struct {
	Name string
	Pos  TokenPos
}

func (*IdentExpr) EndPos ¶

func (e *IdentExpr) EndPos() TokenPos

func (*IdentExpr) StartPos ¶

func (e *IdentExpr) StartPos() TokenPos

type KeyValExpr ¶

type KeyValExpr struct {
	Key      IdentExpr
	Val      Expr
	ColonPos TokenPos
}

func (*KeyValExpr) EndPos ¶

func (e *KeyValExpr) EndPos() TokenPos

func (*KeyValExpr) StartPos ¶

func (e *KeyValExpr) StartPos() TokenPos

type LiteralExpr ¶

type LiteralExpr struct {
	Pos  TokenPos
	Type TokenType
	// Value depends on the type, so it can contain a numeric, string etc
	Value string
}

func (*LiteralExpr) EndPos ¶

func (e *LiteralExpr) EndPos() TokenPos

func (*LiteralExpr) StartPos ¶

func (e *LiteralExpr) StartPos() TokenPos

type Node ¶

type Node interface {
	// StartPos is the position of the first byte of the first character making up this node
	StartPos() TokenPos
	// EndPos is the position of the first byte of the first character that doesn't belong to this node.
	// This means EndPos is +1 of the last character, so it acts in the same way len() does
	EndPos() TokenPos
}

type ObjectLiteralExpr ¶

type ObjectLiteralExpr struct {
	OpenCurly  TokenPos
	CloseCurly TokenPos
	KeyVals    []KeyValExpr
}

func (*ObjectLiteralExpr) EndPos ¶

func (e *ObjectLiteralExpr) EndPos() TokenPos

func (*ObjectLiteralExpr) StartPos ¶

func (e *ObjectLiteralExpr) StartPos() TokenPos

type Parser ¶

type Parser struct {
	Query string
}

func NewParser ¶

func NewParser(query string) *Parser

func (*Parser) GetNextRuneByByteIndex ¶

func (p *Parser) GetNextRuneByByteIndex(index int) (rune, error)

func (*Parser) GetRuneByByteIndex ¶

func (p *Parser) GetRuneByByteIndex(index int) (rune, error)

func (*Parser) Tokenize ¶

func (p *Parser) Tokenize() (tokens []Token, err error)

func (*Parser) ValidateTokens ¶

func (p *Parser) ValidateTokens(tokens []Token) error

type ParserError ¶

type ParserError struct {
	Err error
	Pos TokenPos
}

func (*ParserError) Error ¶

func (te *ParserError) Error() string

type RegexOptions ¶

type RegexOptions struct {
	CaseSensitive  bool
	FindAllMatches bool
}

type Regexl ¶

type Regexl struct {
	Query          string
	CompiledRegexp *regexp.Regexp
}

func NewRegexl ¶

func NewRegexl(query string) *Regexl

func (*Regexl) Compile ¶

func (rl *Regexl) Compile() error

Compile tries to compile the query within this Regexl object and then sets Regexl.CompiledRegexp. Regexl.CompiledRegexp is only set if no error is found, otherwise the error is returned and Regexl.CompiledRegexp is unchanged.

func (*Regexl) MustCompile ¶

func (rl *Regexl) MustCompile() *Regexl

MustCompile compiles the query within this regexl object by calling Regexl.Compile and panics if an error is thrown

type SelectStmt ¶

type SelectStmt struct {
	Pos  TokenPos
	Type TokenType
	Es   []Expr
}

func (*SelectStmt) EndPos ¶

func (s *SelectStmt) EndPos() TokenPos

func (*SelectStmt) StartPos ¶

func (s *SelectStmt) StartPos() TokenPos

type Stmt ¶

type Stmt interface {
	Node
	// contains filtered or unexported methods
}

type Token ¶

type Token struct {
	Val  string
	Type TokenType
	Pos  TokenPos
}

func (*Token) HasLoc ¶

func (t *Token) HasLoc() bool

func (*Token) IsEmpty ¶

func (t *Token) IsEmpty() bool

func (*Token) MakeEmpty ¶

func (t *Token) MakeEmpty()

type TokenPos ¶

type TokenPos int

type TokenType ¶

type TokenType int

const (
	TokenType_Unknown TokenType = iota
	TokenType_Space
	TokenType_String
	// TokenType_Single_Quote
	TokenType_Int
	TokenType_Float
	TokenType_Operator
	TokenType_OpenBracket
	TokenType_CloseBracket
	TokenType_OpenCurlyBracket
	TokenType_CloseCurlyBracket
	TokenType_Colon
	TokenType_Comma
	TokenType_Bool
	TokenType_Plus
	TokenType_Comment
	TokenType_Object_Param
	TokenType_Function_Name
	TokenType_Keyword
)

func (TokenType) MarshalText ¶

func (tt TokenType) MarshalText() (text []byte, err error)

func (TokenType) String ¶

func (i TokenType) String() string

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
playground
test-server

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL