command
module
Version:
v1.0.0
Opens a new window with list of versions in this module.
Published: Apr 8, 2023
License: MIT
Opens a new window with license information.
Imports: 11
Opens a new window with list of imports.
Imported by: 0
Opens a new window with list of known importers.
README
¶
llamacpphtmld
A web interface and API for the LLaMA large language AI model, based on the llama.cpp runtime.
Features
- Live streaming responses
- Continuation-based UI
- Supports interrupt, modify, and resume
- Configure the maximum number of simultaneous users
- Works with any LLaMA model including Vicuna
- Bundled copy of llama.cpp, no separate compilation required
Usage
All configuration should be supplied as environment variables:
LCH_MODEL_PATH=/srv/llama/ggml-vicuna-13b-4bit-rev1.bin \
LCH_NET_BIND=:8090 \
LCH_SIMULTANEOUS_REQUESTS=1 \
./llamacpphtmld
API usage
The generate
endpoint will live stream new tokens into an existing conversation until the LLM stops naturally.
- Usage:
curl -v -X POST -d '{"Content": "The quick brown fox"}' 'http://localhost:8090/api/v1/generate'
- You can optionally supply
ConversationID
and APIKey
string parameters. However, these are not currently used by the server.
- You can optionally supply a
MaxTokens
integer parameter, to cap the number of generated tokens from the LLM.
License
MIT
Changelog
2023-04-08 v1.0.0
Documentation
¶
There is no documentation for this package.
Source Files
¶
Click to show internal directories.
Click to hide internal directories.