distil
- AutoML Server automated machine learning server component that implements the D3M API.
- Primitives set of primitives created for use by Distil as steps in a D3M pipeline and included in the base D3M image.
- Primitives Addendum set of primitives created for use by Distil as steps in a D3M pipeline and not included in the base D3M image.
Dependencies
- Git and Git LFS Versioning softwares.
- Go programming language binaries with the
GOPATH
environment variable specified and $GOPATH/bin
in your PATH
.
- NodeJS JavaScript runtime.
- Docker platform.
- Docker Compose (optional) for managing multi-container dev environments.
- GDAL v2.4.2 or better for geospatial data access. Available as a package for most Linux distributions, and OSX through Homebrew.
Development
Clone the repository:
mkdir -p $GOPATH/src/github.com/uncharted-distil
cd $GOPATH/src/github.com/uncharted-distil
git clone [email protected]:unchartedsoftware/distil.git
cd distil
Install dependencies:
make install
Install datasets:
Datasets are stored using git LFS and can be pulled using the datasets.sh
script.
./datasets.sh
To add / remove a dataset modify the $datasets
variable:
declare -a datasets=("185_baseball" "LL0_acled" "22_handgeometry")
Generate code (optional):
To regenerate the PANDAS dataframe parser if the api/compute/result/complex_field.peg
file is changed, run:
make peg
Docker images:
The application requires:
- ElasticSearch
- PostgreSQL
- TA2 Pipeline Server Stub
Docker images for each are available at the following registry:
docker.uncharted.software
Login to Docker Registry:
sudo docker login docker.uncharted.software
Update docker-compose.yml
---
distil-auto-ml:
image: docker.uncharted.software/distil-auto-ml
Pull Images:
Pull docker images via Docker Compose:
./update_services.sh
Running the app:
Using three separate terminals:
./run_services.sh
Terminal 2 - Build and watch webapp:
yarn watch
The app will be accessible at localhost:8080
.
Terminal 3 - Build, watch, and run server:
make watch
Advanced Configuration
The location of the dataset directory can be changed by setting the D3MINPUTDIR
environment variable, and the location of the temporary data written out during model building can be set using the D3MOUTPUTDIR
environment variable.
The host IP address of the docker containers if not localhost can be set with DOCKER_HOST
. (i.e.export DOCKER_HOST=192.168.0.10 && make watch
.)
These are used by the other Distil services that are launched via the run_services.sh
script, and are typically set as global environment variables in .bashrc
or similar.
Linter Setup
VSCODE
For the VsCode editor download and install the eslint extension.
Once installed go to the editor settings (hot key ⌘⇧p -- type settings)
Add the following to your settings file:
"eslint.lintTask.enable": true, // enable eslint to run
"eslint.validate": [
"vue", // tell eslint to read vue files
"html", // tell eslint to read html files
"javascript", // tell eslint to read javascript files
"typescript" // tell eslint to read typescript files
],
"eslint.workingDirectories": [{ "mode": "auto" }], // eslint will try its best to figure out the working directory of the project
At this point save your settings file and restart VsCode.
If upon restarting and the linter is not working check the output (^⇧` -- OUTPUT tab -- dropdown -- ESlint)
Common Issues:
- Cause: Dependencies are out of date or have not been installed
- Solution: Run
make install
to install latest dependencies.
"# pkg-config --cflags -- gdal gdal gdal gdal gdal gdal Package gdal was not found in the pkg-config search path."
- Cause: GDAL has not been installed
- Solution: Install GDAL using a package for your environment or download and build from source.
Mac
runtime error while training "joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker."
- Cause: Not enough Docker resources
- Solution: change Docker resources to recommended "CPU:10, RAM:10 gigs, Swap:2.5 gigs, Disk Image Size: 64 gigs"