Installation¶
Make sure to read Prerequisites before installing mlbench.
All guides assume you have checked out the mlbench github repository and have a terminal open in the checked-out mlbench
directory.
Helm Chart values¶
Since every Kubernetes is different, there are no reasonable defaults for some values, so the following properties have to be set. You can save them in a yaml file of your chosing. This guide will assume you saved them in myvalues.yaml. For a reference file for all configurable values, you can copy the charts/mlbench/values.yaml file to myvalues.yaml.
limits:
workers:
cpu:
bandwidth:
gpu:
gcePersistentDisk:
enabled:
pdName:
limits.workers
is the maximum number of worker nodes available to mlbench. This sets the maximum number of nodes that can be chosen for an experiment in the UI. By default mlbench starts 2 workers on startup.limits.cpu
is the maximum number of CPUs (Cores) available on each worker node. Uses Kubernetes notation (8 or 8000m for 8 cpus/cores). This is also the maximum number of Cores that can be selected for an experiment in the UIlimits.bandwidth
is the maximum network bandwidth available between workers, in mbit per second. This is the default bandwidth used and the maximum number selectable in the UI.limits.gpu
is the number of gpus requested by each worker pod.gcePersistentDisk.enabled
create resources related to NFS persistentVolume and persistentVolumeClaim.gcePersistentDisk.pdName
is the name of persistent disk existed in GKE.
Caution
If you set workers
, cpu
or gpu
higher than available in your cluster, Kubernetes will not be able to allocate nodes to mlbench and the deployment will hang indefinitely, without throwing an exception.
Kubernetes will just wait until nodes that fit the requirements become available. So make sure your cluster actually has the requirements avilable that you requested.
Note
To use gpu
in the cluster, the nvidia device plugin should be installed. See Plugins for details
Note
Use commands like gcloud compute disks create --size=10G --zone=europe-west1-b my-pd-name
to create persistent disk.
Note
The GCE persistent disk will be mounted to /datasets/ directory on each worker.
Basic Install¶
Set the Helm Chart values
Use helm to install the mlbench chart (Replace ${RELEASE_NAME}
with a name of your choice):
$ helm upgrade --wait --recreate-pods -f values.yaml --timeout 900 --install ${RELEASE_NAME} charts/mlbench
Follow the instructions at the end of the helm install to get the dashboard URL. E.g.:
$ helm upgrade --wait --recreate-pods -f values.yaml --timeout 900 --install rel charts/mlbench
[...]
NOTES:
1. Get the application URL by running these commands:
export NODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].nodePort}" services rel-mlbench-master)
export NODE_IP=$(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
This outputs the URL the Dashboard is accessible at.
Plugins¶
In values.yaml
, one can optionally install Kubernetes plugins by turning on/off the following flags:
weave.enabled
: If true, install the weave network plugin.nvidiaDevicePlugin.enabled
: If true, install the nvidia device plugin.
Google Cloud / Google Kubernetes Engine¶
Set the Helm Chart values
Important
Make sure to read the prerequisites for Google Cloud
Please make sure that kubectl
is configured correctly.
Caution
Google installs several pods on each node by default, limiting the available CPU. This can take up to 0.5 CPU cores per node. So make sure to provision VM’s that have at least 1 more core than the amount of cores you want to use for you mlbench experiment. See here for further details on node limits.
Install mlbench (Replace ${RELEASE_NAME}
with a name of your choice):
$ helm upgrade --wait --recreate-pods -f values.yaml --timeout 900 --install ${RELEASE_NAME} charts/mlbench
To access mlbench, run these commands and open the URL that is returned (Note: The default instructions returned by helm on the commandline return the internal cluster ip only):
$ export NODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].nodePort}" services ${RELEASE_NAME}-mlbench-master)
$ export NODE_IP=$(gcloud compute instances list|grep $(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}") |awk '{print $5}')
$ gcloud compute firewall-rules create --quiet mlbench --allow tcp:$NODE_PORT,tcp:$NODE_PORT
$ echo http://$NODE_IP:$NODE_PORT
Danger
The last command opens up a firewall rule to the google cloud. Make sure to delete the rule once it’s not needed anymore:
$ gcloud compute firewall-rules delete --quiet mlbench
Hint
If you want to build the docker images yourself and host it in the GC registry, follow these steps:
Authenticate with GC registry:
$ gcloud auth configure-docker
Build docker images (Replace <gcloud project name> with the name of your project):
$ make publish-docker component=master docker_registry=gcr.io/<gcloud project name>
$ make publish-docker component=worker docker_registry=gcr.io/<gcloud project name>
Use the following settings for your myvalues.yaml file when installing with helm:
master:
image:
repository: gcr.io/<gcloud project name>/mlbench_master
tag: latest
pullPolicy: Always
worker:
image:
repository: gcr.io/<gcloud project name>/mlbench_worker
tag: latest
pullPolicy: Always
Minikube¶
Minikube allows running a single-node Kubernetes cluster inside a VM on your laptop, for users looking to try out Kubernetes or to develop with it.
Installing mlbench to minikube.
Set the Helm Chart values
First build docker images and push them to private registry localhost:5000.
$ make publish-docker component=master docker_registry=localhost:5000
$ make publish-docker component=worker docker_registry=localhost:5000
Then start minikube cluster
$ minikube start
Use tcp-proxy to forward node’s 5000 port to host’s port 5000 so that one can pull images from local registry.
$ minikube ssh
$ docker run --name registry-proxy -d -e LISTEN=':5000' -e TALK="$(/sbin/ip route|awk '/default/ { print $3 }'):5000" -p 5000:5000 tecnativa/tcp-proxy
Now we can pull images from private registry inside the cluster, check docker pull localhost:5000/mlbench_master:latest
.
Next install or upgrade a helm chart with desired configurations with name ${RELEASE_NAME}
$ helm init --kube-context minikube --wait
$ helm upgrade --wait --recreate-pods -f myvalues.yaml --timeout 900 --install ${RELEASE_NAME} charts/mlbench
Note
The minikube runs a single-node Kubernetes cluster inside a VM. So we need to fix the replicaCount=1
in values.yaml.
Once the installation is finished, one can obtain the url
$ export NODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].nodePort}" services ${RELEASE_NAME}-mlbench-master)
$ export NODE_IP=$(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}")
$ echo http://$NODE_IP:$NODE_PORT
Now the mlbench dashboard should be available at http://${NODE_IP}:${NODE_PORT}
.
Note
To access http://$NODE_IP:$NODE_PORT
outside minikube, run the following command on the host:
$ ssh -i ${MINIKUBE_HOME}/.minikube/machines/minikube/id_rsa -N -f -L localhost:${NODE_PORT}:${NODE_IP}:${NODE_PORT} docker@$(minikube ip)
where $MINIKUBE_HOME
is by default $HOME
. One can view mlbench dashboard at http://localhost:${NODE_PORT}
Docker-in-Docker (DIND)¶
Docker-in-Docker allows simulating multiple nodes locally on a single machine. This is useful for development.
Hint
For development purposes, it makes sense to use a local docker registry as well with DIND.
Describing how to set up a local registry would be too long for this guide, so here are some pointers:
Download the kubeadm-dind-cluster script.
$ wget https://cdn.rawgit.com/kubernetes-sigs/kubeadm-dind-cluster/master/fixed/dind-cluster-v1.11.sh
$ chmod +x dind-cluster-v1.11.sh
For networking to work in DIND, we need to set a CNI Plugin. In our experience, weave
works well with DIND.
$ export CNI_PLUGIN=weave
Now we can start the local cluster with
$ ./dind-cluster-v1.11.sh up
This might take a couple of minutes.
Hint
If you’re using a local docker registry, run dind-proxy.sh
after the previous step.
Install helm
(See Prerequisites) and set the Helm Chart values.
Hint
For a local registry, build and push the master
and worker
images:
$ make publish-docker component=master docker_registry=localhost:5000
$ make publish-docker component=worker docker_registry=localhost:5000
Also, make sure you have an imagePullSecret
added to the kubernetes serviceaccount and set the repository and secret in the values.yaml
file (regcred
in this example):
master:
imagePullSecret: regcred
image:
repository: localhost:5000/mlbench_master
tag: latest
pullPolicy: Always
worker:
imagePullSecret: regcred
image:
repository: localhost:5000/mlbench_worker
tag: latest
pullPolicy: Always
Install mlbench (Replace ${RELEASE_NAME}
with a name of your choice):
$ helm upgrade --wait --recreate-pods -f values.yaml --timeout 900 --install rel charts/mlbench
[...]
NOTES:
1. Get the application URL by running these commands:
export NODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].nodePort}" services rel-mlbench-master)
export NODE_IP=$(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
Run the 3 commands printed by the last command. This outputs the URL the Dashboard is accessible at.