distil-ingest
Dependencies
Requires the Go programming language binaries with the GOPATH
environment variable specified and $GOPATH/bin
in your PATH
.
Installation
go get github.com/uncharted-distil/distil-ingest
Development
Clone the repository:
mkdir $GOPATH/src/github.com/unchartedsoftware
cd $GOPATH/src/github.com/unchartedsoftware
git clone [email protected]:uncharted-distil/distil-ingest.git
Install dependencies:
cd distil-ingest
make install
Build executable:
make build
Usage
The repository contains CLIs used to parse, and ingest 3M OpenML datasets (those with a name beginning with o_
) into elasticsearch.
Merging training and target datasets:
Classifying merged datasets:
- Update and ensure the arguments in
./classify_all.sh
are correct
- Run
./classify_all.sh
Ingesting merged and classified datasets:
- Update and ensure the arguments in
./ingest_all.sh
are correct
- Run
./ingest_all.sh
Common Issues:
"EOF"
- The Elasticsearch instance does not have
http.compression
enabled.
- The
mappings
json argument is invalid, most likely missing a closing bracket
"No Elasticsearch node available"
- You are accessing an Elasticsearch instance that requires a VPN and it is not on.
- The Elasticsearch instance is temporarily down.
"dep: command not found":
- Cause:
$GOPATH/bin
has not been added to your $PATH
.
- Solution: Add
export PATH=$PATH:$GOPATH/bin
to your .bash_profile
or .bashrc
.