kagome.ipadic

module

v1.1.2 Latest Latest Go to latest Published: Nov 21, 2019 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ikawaha/kagome.ipadic

Links

Open Source Insights

README ¶

Kagome Japanese Morphological Analyzer (IPADic only)

kagome.ipadic is a small version of kagome. This package supports the IPADic only.

Programming example

Below is a simple go example that demonstrates how a simple text can be segmented.

sample code:

package main

import (
	"fmt"
	"strings"

	"github.com/ikawaha/kagome.ipadic/tokenizer"
)

func main() {
	t := tokenizer.New()
	tokens := t.Tokenize("寿司が食べたい。") // t.Analyze("寿司が食べたい。", tokenizer.Normal)
	for _, token := range tokens {
		if token.Class == tokenizer.DUMMY {
			// BOS: Begin Of Sentence, EOS: End Of Sentence.
			fmt.Printf("%s\n", token.Surface)
			continue
		}
		features := strings.Join(token.Features(), ",")
		fmt.Printf("%s\t%v\n", token.Surface, features)
	}
}

output:

BOS
寿司    名詞,一般,*,*,*,*,寿司,スシ,スシ
が      助詞,格助詞,一般,*,*,*,が,ガ,ガ
食べ    動詞,自立,*,*,一段,連用形,食べる,タベ,タベ
たい    助動詞,*,*,*,特殊・タイ,基本形,たい,タイ,タイ
。      記号,句点,*,*,*,*,。,。,。
EOS

Working with GAE/Go

Using fully kagome.ipadic on GAE/Go needs at least B4 instance (>512MB memory). If you use a simple dictionary without contents other than part of speech, it will be able to run on B1 instance. Even in that case, the analysis result does not change, the output contents (活用型, 活用形, 基本形, 読み, 発音) are omitted.

Instance Class	Memory Limit	CPU Limit
B1	128 MB	600 Mhz
B2	256 MB	1.2 Ghz
B4	512 MB	2.4 Ghz
B4_1G	1024 MB	2.4 Ghz
B8	1024 MB	4.8 Ghz
F1	128 MB	600 Mhz
F2	256 MB	1.2 Ghz
F4	512 MB	2.4 Ghz
F4_1G	1024 MB	2.4 Ghz

Usage

command:
    use `-sysdic=simple` option. ex, kagome -sysdic=simple
lib:
    use `dic := tokenizer.SysDicIPASimple()` instead of `dic := tokenizer.SysDic()`

Full Dict.

BOS
寿司    名詞,一般,*,*,*,*,寿司,スシ,スシ
が      助詞,格助詞,一般,*,*,*,が,ガ,ガ
食べ    動詞,自立,*,*,一段,連用形,食べる,タベ,タベ
たい    助動詞,*,*,*,特殊・タイ,基本形,たい,タイ,タイ
。      記号,句点,*,*,*,*,。,。,。
EOS

Simple Dict.

BOS
寿司    名詞,一般,*,*,*,*
が      助詞,格助詞,一般,*,*,*
食べ    動詞,自立,*,*,一段,連用形
たい    助動詞,*,*,*,特殊・タイ,基本形
。      記号,句点,*,*,*,*
EOS

WebAssembly

You can see how kagome wasm works in demo site.

Sample main.go

package main

import (
	"syscall/js"

	"github.com/ikawaha/kagome.ipadic/tokenizer"
)

func tokenize(_ js.Value, args []js.Value) interface{} {
	t := tokenizer.New()
	if len(args) == 0 {
		return nil
	}
	ret := []interface{}{}
	tokens := t.Tokenize(args[0].String())
	for _, token := range tokens {
		if token.Class == tokenizer.DUMMY {
			//fmt.Printf("%s\n", token.Surface)
			continue
		}
		features := token.Features()
		for i := 9 - len(features); i > 0; i-- {
			features = append(features, "*")
		}
		//fmt.Printf("%s\t%v\n", token.Surface, strings.Join(features, ","))
		ret = append(ret, map[string]interface{}{
			"word_id":         token.ID,
			"word_type":       token.Class.String(),
			"word_position":   token.Start,
			"surface_form":    token.Surface,
			"pos":             features[0],
			"pos_detail_1":    features[1],
			"pos_detail_2":    features[2],
			"pos_detail_3":    features[3],
			"conjugated_type": features[4],
			"conjugated_form": features[5],
			"basic_form":      features[6],
			"reading":         features[7],
			"pronunciation":   features[8],
		})
	}
	return ret
}

var global = js.Global()

func main() {
	_ = tokenizer.New()
	c := make(chan struct{}, 0)
	println("Go Web Assembly Ready")

	global.Set("kagome", js.FuncOf(tokenize))
	<-c
}

Build wasm

$ GOOS=js GOARCH=wasm go build -o kagome.wasm ./main.go

License

Kagome is licensed under the Apache License v2.0 and uses the MeCab-IPADIC model. See NOTICE.txt for license details.

Directories ¶

Path	Synopsis
cmd
_dictool
_dictool/ipa
kagome
kagome/lattice
kagome/server
kagome/tokenize
internal
da Package da implements the double array library.	Package da implements the double array library.
dic Package dic implements the dictionary of the morph analyzer.	Package dic implements the dictionary of the morph analyzer.
dic/data
lattice Package lattice implements the core of the morph analyzer.	Package lattice implements the core of the morph analyzer.
splitter Package splitter is a utility for preprocessing japanese texts.	Package splitter is a utility for preprocessing japanese texts.
tokenizer Package tokenizer is a japanese morphological analyzer library.	Package tokenizer is a japanese morphological analyzer library.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL