Documentation ¶
Overview ¶
Package gnlp provides generic Natural Language Processing tookit.
Index ¶
- func BLEU[T comparable](candidate []T, references [][]T) float64
- func CorpusBLEU[T comparable](candidateList [][]T, referencesList [][][]T) float64
- func DamerauLevenshteinDistance[T comparable](a, b []T) int64
- func DiceIndex[T comparable](a, b []T) float64
- func HammingDistance[T comparable](a, b []T) (int64, error)
- func JaccardIndex[T comparable](a, b []T) float64
- func JaroSimilarity[T comparable](a, b []T) float64
- func JaroWinklerSimilarity[T comparable](a, b []T) float64
- func LevenshteinDistance[T comparable](a, b []T) int64
- func LongestCommonSubsequences[T comparable](a, b []T) [][]T
- func NGrams[T any](seq []T, n int) (ngram [][]T)
- func ROUGEL[T comparable](candidate []T, references [][]T) (recall, precision float64)
- func ROUGEN[T comparable](candidate []T, references [][]T, n int) float64
- func SimpsonIndex[T comparable](a, b []T) float64
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func BLEU ¶
func BLEU[T comparable](candidate []T, references [][]T) float64
BLEU computes a sentence-level BLEU score.
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. "BLEU: a method for automatic evaluation of machine translation." In Proceedings of ACL. https://www.aclweb.org/anthology/P02-1040.pdf
The candidate parameter is a sequence of token and the references parameter is a set of sequences of token. This method returns zero if there's no refernece.
func CorpusBLEU ¶
func CorpusBLEU[T comparable](candidateList [][]T, referencesList [][][]T) float64
CorpusBLEU computes a corpus-level BLEU score.
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. "BLEU: a method for automatic evaluation of machine translation." In Proceedings of ACL. https://www.aclweb.org/anthology/P02-1040.pdf
The candidate list and references list should be the same length. Otherwise it returns zero.
Note that this method doesn't return the average of sentence-level BLEU score. It calculates the micro-average of precision as the original BLEU paper.
func DamerauLevenshteinDistance ¶
func DamerauLevenshteinDistance[T comparable](a, b []T) int64
DamerauLevenshteinDistance computes Damerau-Levenshtein distance between two sequences.
func DiceIndex ¶
func DiceIndex[T comparable](a, b []T) float64
DiceIndex computes Sørensen-Dice index (Sørensen-Dice similarity coefficient) of two sets.
func HammingDistance ¶
func HammingDistance[T comparable](a, b []T) (int64, error)
HammingDistance computes Hamming distance between two sequences of the same length.
func JaccardIndex ¶
func JaccardIndex[T comparable](a, b []T) float64
JaccardIndex computes Jaccard index (Jaccard similarity coefficient) of two sets.
func JaroSimilarity ¶
func JaroSimilarity[T comparable](a, b []T) float64
JaroSimilarity computes Jaro similarity between two sequences.
func JaroWinklerSimilarity ¶
func JaroWinklerSimilarity[T comparable](a, b []T) float64
JaroWinklerSimilarity computes Jaro-Winkler similarity between two sequences. The scaling factor is set to 0.1.
func LevenshteinDistance ¶
func LevenshteinDistance[T comparable](a, b []T) int64
LevenshteinDistance computes Levenshtein distance between two sequences.
func LongestCommonSubsequences ¶
func LongestCommonSubsequences[T comparable](a, b []T) [][]T
LongestCommonSubsequences returns longest subsequences commmon to given two sequences. It returns all valid subsequeces.
This method returns a slice which contains at least one sequence. It returns [][]T{{}} if there's no common subsequence.
func ROUGEL ¶
func ROUGEL[T comparable](candidate []T, references [][]T) (recall, precision float64)
ROUGEL computes a ROUGE-L score. which is a text summarization metrics based on the longest common subsequence.
Chin-Yew Lin. 2004. "ROUGE: A Package for Automatic Evaluation of Summaries." In Proceedings of ACL. https://aclanthology.org/W04-1013.pdf
func ROUGEN ¶
func ROUGEN[T comparable](candidate []T, references [][]T, n int) float64
ROUGEN computes a ROUGE-N score, which is a recall-oriented text summarization metrics.
Chin-Yew Lin. 2004. "ROUGE: A Package for Automatic Evaluation of Summaries." In Proceedings of ACL. https://aclanthology.org/W04-1013.pdf
func SimpsonIndex ¶
func SimpsonIndex[T comparable](a, b []T) float64
SimpsonIndex computes Szymkiewicz–Simpson index (Szymkiewicz–Simpson similarity coefficient) of two sets.
Types ¶
This section is empty.