go

module

v0.0.0-...-e1d70e2 Latest Latest Go to latest Published: Oct 30, 2020 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/cobaltspeech/sdk-diatheke

Links

Open Source Insights

README ¶

Go Examples

This directory contains examples demonstrating how to use the Diatheke SDK.

Build

The code for the demo executables is found in the cmd directory, with each demo having its own subdirectory. To build a demo, simply use the go build command, with the specific demo's path specified. For example,

go build ./cmd/demo-asr
go build ./cmd/demo-audio
go build ./cmd/demo-cli
go build ./cmd/demo-tts

Run

Each demo illustrates a different aspect of the Diatheke SDK. While the demos separate these functions for the sake of simplicity, there is nothing restricting a developer from mixing all of them in the same application.

Each demo is terminal based and relies on keyboard input in some form, whether it be text input, or merely pressing the Enter key to start and stop audio recording. As such, they should all be launched from a terminal window. Each demo may be terminated by using the Ctrl+D shortcut.

The four demos are

demo-audio - Creates a dialog using audio I/O.
demo-cli - Creates a text-based dialog
demo-asr - Demonstrates Automatic Speech Recognition (ASR).
demo-tts - Demonstrates Text-to-Speech (TTS).

Config File

Each demo application requires a configuration file to be specified. An example config file with documentation about each parameter in the file is provided in examples/config.sample.toml. The same config file will work for all the demos.

Audio I/O

For demos that use audio, the audio I/O is handled exclusively by external applications such as aplay/arecord and sox. The specific application can be anything as long the following conditions are met:

The application supports the encodings, sample rate, bit-depth, etc. required by the underlying Cubic ASR and Luna TTS models.
For recording, the application must stream audio data to stdout.
For playback, the application must accept audio data from stdin.

The specific applications (and their args) should be specified in the configuration file.

demo-audio

# Run the audio-based dialog demo
./demo-audio -config <path/to/config.toml>

This application demonstrates how to set up an audio-based conversation with Diatheke. This demo requires that both recording and playback applications be specified in the config file. Audio recording is toggled by pressing the Enter key. When recording, a user may speak an appropriate phrase (defined by the Diatheke model being used), and Diatheke will respond with TTS generated audio.

demo-cli

# Run the text-based dialog demo
./demo-cli -config <path/to/config.toml>

This application demonstrates how to setup a text-based conversation with Diatheke. A user simply sends text to Diatheke at the prompt, and Diatheke will return a text response. The Diatheke model defines the specific text that a user is allowed to submit, as well as the replies that come back from Diatheke.

demo-asr

# Run the streaming ASR demo
./demo-asr -config <path/to/config.toml>

This application demonstrates how to run Automatic Speech Recognition (ASR) that is unrelated to any running conversations in Diatheke. This demo requires a recording application to be specified in the config file. Audio recording is toggled by pressing the Enter key. While recording, a user may speak any phrase that is supported by the underlying Cubic ASR model. Diatheke will forward the transcription from Cubic back to the client, which will then be printed to the terminal for the user to see.

demo-tts

# Run the streaming TTS demo
./demo-tts -config <path/to/config.toml>

This application demonstrates how to run Text-to-Speech (TTS) that is unrelated to any running conversations in Diatheke. This demo requires a playback application to be specified in the config file. While running, a user may submit any text at the prompt, which Diatheke will forward to the underlying Luna TTS engine. The generated speech will be returned to the client and played through the external playback application.

Tip: When the submitted text includes numbers, you can avoid unexpected results by spelling out the number. For example, the number 1804 could be spelled as "one thousand eight hundred and four", "eighteen hundred and four", "eighteen oh four", or even "one eight zero four", depending on context and preference.

Directories ¶

Path	Synopsis
cmd
demo-asr
demo-audio
demo-cli
demo-tts
internal
audio
config
textui

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL