ngram

package
v0.0.80 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 19, 2023 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Overview

Package for ngram analysis

Index

Constants

View Source
const MAX_COLLOCATIONS = 10 // Max to report

Max collocation elements for a single word

View Source
const MAX_STORE = 100 // Max to store

Variables

This section is empty.

Functions

This section is empty.

Types

type Bigram

type Bigram struct {
	HeadwordDef1                            *dicttypes.Word // First headword
	HeadwordDef2                            *dicttypes.Word // Second headword
	Example, ExFile, ExDocTitle, ExColTitle *string
}

A struct to hold an instance of a Bigram Since they could be either simplified or traditional, index by the headword ids. Also, include an example of the bigram so that usage context can be investigated

var NULL_BIGRAM_PTR *Bigram

func NewBigram

func NewBigram(hw1, hw2 dicttypes.Word,
	example, exFile, exDocTitle, exColTitle string) *Bigram

Constructor for a Bigram struct

func NullBigram

func NullBigram() *Bigram

func (*Bigram) ContainsFunctionWord

func (bigram *Bigram) ContainsFunctionWord() bool

Bigrams that contain function words should be excluded

func (*Bigram) Simplified

func (bigram *Bigram) Simplified() string

The simplified text of the bigram

func (*Bigram) String

func (bigram *Bigram) String() string

Override string method for comparison

func (*Bigram) Traditional

func (bigram *Bigram) Traditional() string

The traditional text of the bigram

type BigramFreq

type BigramFreq struct {
	BigramVal Bigram
	Frequency int
}

Single record of the frequency of occurence of a bigram

func SortedFreq

func SortedFreq(bfm BigramFreqMap) []BigramFreq

Get the bigram frequencies as a sorted array

type BigramFreqMap

type BigramFreqMap map[string]BigramFreq

Map of the frequency of occurence of a bigram in a collection of texts

func (*BigramFreqMap) GetBigram

func (bfmPtr *BigramFreqMap) GetBigram(bigram *Bigram) BigramFreq

Put the bigram in the bigram frequency map

func (*BigramFreqMap) GetBigramVal

func (bfmPtr *BigramFreqMap) GetBigramVal(id1, id2 int) (*Bigram, bool)

Does the Bigram map contain a bigram with this combination of words?

func (*BigramFreqMap) Merge

func (bfmPtr *BigramFreqMap) Merge(more BigramFreqMap)

Merge another bigram frequency map

func (*BigramFreqMap) PutBigram

func (bfmPtr *BigramFreqMap) PutBigram(bigram *Bigram)

Put the bigram in the bigram frequency map

func (*BigramFreqMap) PutBigramFreq

func (bfmPtr *BigramFreqMap) PutBigramFreq(bigramFreq BigramFreq)

Put the bigram in the bigram frequency map

type CollocationMap

type CollocationMap map[int]BigramFreqMap

The key is the headword id, each entry is a bigram frequency map

func (*CollocationMap) MergeCollocationMap

func (cmPtr *CollocationMap) MergeCollocationMap(more CollocationMap)

Put the bigram in the bigram frequency map for the specific word

func (*CollocationMap) PutBigram

func (cmPtr *CollocationMap) PutBigram(headwordId int, bigram *Bigram)

Put the bigram in the bigram frequency map for the specific word

func (*CollocationMap) PutBigramFreq

func (cmPtr *CollocationMap) PutBigramFreq(key int, bigramFreq BigramFreq)

Add the BigramFreq object to the CollocationMap

func (*CollocationMap) SortedCollocations

func (cmPtr *CollocationMap) SortedCollocations(headwordId int) []BigramFreq

Get the sorted collocations for a given headword, making sure that there are at least two of each and with the total number less than MAX_COLLOCATIONS

type SortedBFM

type SortedBFM struct {
	// contains filtered or unexported fields
}

Sorted into descending order with most frequent bigram first

func NewSortedBFM

func NewSortedBFM(bfm BigramFreqMap) *SortedBFM

func (*SortedBFM) Len

func (sbf *SortedBFM) Len() int

func (*SortedBFM) Less

func (sbf *SortedBFM) Less(i, j int) bool

func (*SortedBFM) Swap

func (sbf *SortedBFM) Swap(i, j int)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL