Documentation ¶
Index ¶
- func DebugStringify(ts []TokenValue) (ret string)
- func DecodeDoc(includeComments bool, notify AfterDocument) (ret charm.State)
- func Equals(ts []TokenValue, ws Span) (okay bool)
- func FindExactMatch(ts []TokenValue, spans []Span) (ret int)
- func HasPrefix(ts []TokenValue, prefix []Word) (okay bool)
- func Hash(s string) uint64
- func JoinWords(ws []Word) string
- func NewTokenizer(n Notifier) charm.State
- func Normalize(ts []TokenValue) (ret string, width int)
- func NormalizeAll(ts []TokenValue) (ret string, err error)
- func Stringify(ts []TokenValue) (ret string, width int)
- func StripArticle(str string) (ret string)
- type AfterDocument
- type Collector
- type Notifier
- type Pos
- type Span
- type SpanList
- type Token
- type TokenValue
- type Tokenizer
- type Word
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DebugStringify ¶
func DebugStringify(ts []TokenValue) (ret string)
turn all of the passed tokens into a helpful string representation
func DecodeDoc ¶
func DecodeDoc(includeComments bool, notify AfterDocument) (ret charm.State)
public for testing
func Equals ¶
func Equals(ts []TokenValue, ws Span) (okay bool)
func FindExactMatch ¶
func FindExactMatch(ts []TokenValue, spans []Span) (ret int)
search for a span in a list of spans; return the index of the span that matched.
func HasPrefix ¶
func HasPrefix(ts []TokenValue, prefix []Word) (okay bool)
func NewTokenizer ¶
func Normalize ¶
func Normalize(ts []TokenValue) (ret string, width int)
turn a series of string tokens into a normalized string returns the number of string tokens consumed. somewhat dubious because it skips inflect.Normalize
func NormalizeAll ¶
func NormalizeAll(ts []TokenValue) (ret string, err error)
same as Normalize but errors if all of the tokens weren't consumed.
func Stringify ¶
func Stringify(ts []TokenValue) (ret string, width int)
turn a series of string tokens into a space padded string returns the number of string tokens consumed.
func StripArticle ¶
return the name after removing leading articles eats any errors it encounters and returns the original name
Types ¶
type AfterDocument ¶
hands the rune that ended the document, plus the content
type Collector ¶
type Collector struct { Tokens []TokenValue Lines [][]TokenValue KeepComments bool BreakLines bool }
func (*Collector) Decoded ¶
func (at *Collector) Decoded(tv TokenValue) error
type Notifier ¶
type Notifier interface {
Decoded(TokenValue) error
}
callback when a new token exists tbd: maybe a channel instead?
type Span ¶
type Span []Word
Span - implements Match for a chain of individual words.
func FindCommonArticles ¶
func FindCommonArticles(ts []TokenValue) (ret Span, width int)
for now, the common articles are a fixed set. when the author specifies some particular indefinite article for a noun that article only gets used for printing the noun; it doesn't enhance the parsing of the story. ( it would take some work to lightly hold the relation between a name and an article then parse a sentence matching names to nouns in the fwiw: the articles in inform also seems to be predetermined in this way. )
type SpanList ¶
type SpanList []Span
func PanicSpans ¶
func (SpanList) FindExactMatch ¶
func (ws SpanList) FindExactMatch(ts []TokenValue) (ret Span, width int)
func (SpanList) FindPrefix ¶
func (ws SpanList) FindPrefix(words []TokenValue) (ret Span, width int)
this is the same as FindPrefixIndex only it returns a Span instead of an index
func (SpanList) FindPrefixIndex ¶
func (ws SpanList) FindPrefixIndex(words []TokenValue) (retWhich int, retWidth int)
see anything in our span list starts the passed words. for instance, if the span list contains the span "oh hello" then the words "oh hello world" will match returns the index of the index and length of the longest prefix
type Token ¶
type Token int
const ( Invalid Token = iota // placeholder, not generated by the tokenizer Comma // a comma Comment // ex. `# something`, minus the hash Parenthetical // ex. `( something )`, minus parens Quoted // ex. `"something"`, minus the quotes Stop // full stop or other terminal String // delimited by spaces and other special runes Tell // tell subdoc )
types of tokens
type TokenValue ¶
type TokenValue struct { Token Token Pos Pos Value any // a string, expect for Tell subdocuments First bool // helper to know if this is the first token of a sentence }
func Tokenize ¶
func Tokenize(str string) (ret []TokenValue, err error)
func (TokenValue) Equals ¶
func (w TokenValue) Equals(other uint64) bool
func (TokenValue) Hash ¶
func (tv TokenValue) Hash() (ret uint64)
func (TokenValue) String ¶
func (tv TokenValue) String() (ret string)
a string *representation* of the value