The highest tagged major version is v2.

convolutionalnn_googlebatch

command

v1.3.5 Latest Latest Go to latest Published: Feb 1, 2024 License: BSD-2-Clause Imports: 9 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/dgruber/wfl

Links

Open Source Insights

README ¶

Cifar10 Convolutional Neural Network Parallel Training on Google Cloud

This application demonstrates parallel training of a convolutional neural network (CNN) model using the Cifar10 dataset on Google Cloud. It utilizes wfl and the Google Cloud Batch API for training multiple model instances concurrently and stores the output in a Google Cloud Storage bucket. It uses Spot instances for reducing the cloud costs involved.

Prerequisites

A Google Cloud account with billing and APIs enabled
A Google Cloud Storage bucket
Google Cloud SDK installed and configured on your local machine
Go installed on your local machine

Setup

Clone this repository:

git clone https://github.com/dgruber/wfl.git
cd wfl/examples/convolutionalnn_googlebatch

Set the required environment variables:

export GOOGLE_PROJECT="your-google-project-id"
export GOOGLE_BUCKET="your-google-bucket-name"

Replace your-google-project-id and your-google-bucket-name with your Google Cloud project ID and Google Cloud Storage bucket name, respectively.

Build the container image and push to Google Container Registry:

make build
make push

Run

Execute the wfl application:

make run

The application will perform the following steps:

Create a Google Batch context for running training jobs using your specified Google Cloud project and bucket.
Run a data preparation job to split the Cifar10 dataset into multiple parts for parallel training.
Submit multiple parallel training jobs using different parts of the dataset.
Wait for all training jobs to complete.
Print the accuracy and runtime of each training job.

Output

The application will print the progress, status, and results of each job in the console. The trained model files, accuracy results, and logs will be stored in the specified Google Cloud Storage bucket.

Customization

You can customize the number of parallel training jobs, machine types, and other job parameters by modifying the cifar.go file and rebuilding the application.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

cifar.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL