words

package module
v1.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 30, 2023 License: MIT Imports: 3 Imported by: 1

README

Words Test Status codecov Go Reference

Go package words provides capabilities for extracting words from a string, by a collection of rules.

Rules

  1. Invalid UTF8-strings will not be split
  2. Hyphenated words will be treated as individual words unless disabled. E.g. "small-town" => []{"small", "town"}
  3. If the character is a space, punctuation or symbol, it will be voided, unless disabled. E.g. "my_string here" => []{"my", "string", "here"}
  4. Characters of same type in sequence, will be put together.
  5. If the current character is a lowercase, and the last character of the previous word was uppercase, the uppercase letter will be moved to the lowercase string. E.g. "YAMLParser" => []{"YAML", "Parser"}

Installation

$ go get github.com/imbue11235/words

Usage

Basic usage
words.Extract("Do you prefer camelCase to snake_case?") 
// => []string{"Do", "you", "prefer", "camel", "case", "to", "snake", "case")

words.Extract("YAMLParser")
// => []string{"YAML", "Parser"}

words.Extract("Bose QC35")
// => []string{"Bose", "QC", "35"}
With options

To further customize the extraction, options can be passed to the extract-method.

Punctuation

To include punctuation

words.Extract("So, now punctuation will be included.", words.IncludePunctuation())
// => []string{"So", ",", "now", "punctuation", "will", "be", "included", "."}
Spaces

To include spaces

words.Extract("So   many   spaces", words.IncludeSpaces())
// => []string{"So", "   ", "many", "   ", "spaces"}
Symbols

To include symbols

words.Extract("Some>String", words.IncludeSymbols())
// => []string{"Some", ">", "String"}
Hyphenated words

To allow hyphenated words

words.Extract("An anti-clockwise direction", words.AllowHyphenatedWords())
// => []string{"An", "anti-clockwise", "direction"}
Multiple options

To use multiple options at the same time

words.Extract("Using multiple options!" words.IncludeSpaces(), words.IncludePunctuation())
// => []string{"Using", " ", "multiple", " ", "options", "!"}

License

This project is licensed under the MIT license.

Documentation

Overview

Package words provides capabilities for splitting a string into a slice of words by a collection of rules

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Extract

func Extract(input string, options ...Option) []string

Extract extracts words from a given string with potential options.

Types

type Option

type Option func(c *config)

Option defines the interface for applying options to the extraction

func AllowHyphenatedWords

func AllowHyphenatedWords() Option

AllowHyphenatedWords allows hyphenated words in the extraction. E.g. "a family-sized pizza" => []{"a", "family-sized", "pizza"}

func IncludePunctuation

func IncludePunctuation() Option

IncludePunctuation includes punctuation in extraction. E.g. "a.nested_path" => []{"a", ".", "nested", "-", "path"}

func IncludeSpaces

func IncludeSpaces() Option

IncludeSpaces includes spaces in the extraction. E.g. "the moon" => []{"the", " ", "moon"}

func IncludeSymbols

func IncludeSymbols() Option

IncludeSymbols includes symbols in the extraction. E.g. "beer>food" => []{"beer", ">", "food"}

func WithIgnoredRuneKinds added in v1.0.3

func WithIgnoredRuneKinds(runeKinds ...RuneKind) Option

WithIgnoredRuneKinds tells the extractor to ignore these rune kinds when they are encountered, simply adding them to the output as the rune was of most recent rune kind.

func WithIgnoredRunes added in v1.0.2

func WithIgnoredRunes(runes ...rune) Option

WithIgnoredRunes tells the extractor to ignore these runes when they are encountered, simply adding them to the output as the rune was of most recent rune kind. E.g. => WithIgnoredRunes('.') "Etc. and so on" becomes => []{"Etc.", "and", "so", "on"}

type RuneKind added in v1.0.3

type RuneKind int
const (
	Symbol RuneKind = 1 + iota
	Uppercase
	Lowercase
	Space
	Digit
	Punctuation
	Unknown
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL