Try to install CNTK on a Linux machine, if you get it working you are very lucky. I tried it on three - I was unsuccessful each time. A colleague suggested me to try a CNTK docker image. Since I also needed a GPU, I decided to do it on GCP. Why I chose GCP is because it gives you $300 worth of free credits valid for 12 months.
I want to give a rundown of the steps I followed. I also created a few scripts to do almost everything from the terminal.
Google Cloud Platform
There are two modes of working with GCP - GCP dashboard in your browser and the GCP CLI. I’ll give details about the CLI way as after creating a VM you’ll be working through your Terminal only so why not do everything using your Terminal. Although, it’ll be helpful if you go through the Dashboard.
Before following along, create a project using the GCP Dashboard in your browser. This project will contain your VM instances. Go thorough this Creating and Managing Projects page to create your first project.
In the first instruction, you might face an error if your distro is not standard Debian/Ubuntu machine. For example, I work on Elementary OS, which is based on Ubuntu, but
lsb_release -c -s gave the Elementary OS release name, not the standard Ubuntu release name. So, I found out the Ubuntu release my OS is based on and then manually set the
CLOUD_SDK_REPO variable using that name.
To set up everything on your machine run the following command.
Follow the instructions on the terminal. It’ll ask to link your Google account associated with the GCP one. Followed by the default settings like default project, region etc. Select any region you want. If you don’t have any preference then select
us-west1-b. This will give you all the GPU options available. Some regions don’t have all the options.
This quick-start GCP Documentation will show you what will happen once you run the command. It’ll also point you to the relevant points if you want some other things to take care of like proxies, etc.
To create and start a new VM instance now, you need to verify your payment mode. While verifying, it’ll do a test transaction (which they say is reverted after a few days; I haven’t received it yet though). This verification is to just ensure that you are legit. This GCP documentation page will help you if there’s any other question.
Now we will create a VM using
gcloud from our terminal.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 echo "Current Google Cloud Projects with the linked Google account-" gcloud projects list export PROJECT_NAME="gcp-project" echo "Selected project - "$PROJECT_NAME # export INSTANCE_NAME="my-fastai-instance" export INSTANCE_NAME="cntk-docker-vm" export ZONE="us-west2-b" # budget: "us-west1-b" # budget: 'type=nvidia-tesla-k80' export INSTANCE_TYPE="n1-highmem-8" # budget: "n1-highmem-4" # run "gcloud compute images list" to get a list of images and their families # or "pytorch-1-0-cpu-experimental" for non-GPU instances export IMAGE_FAMILY="pytorch-1-0-cu92-experimental" export HDD="500GB" gcloud compute instances create $INSTANCE_NAME \ --zone=$ZONE \ --image-family=$IMAGE_FAMILY \ --image-project=deeplearning-platform-release \ --maintenance-policy=TERMINATE \ --accelerator="type=nvidia-tesla-p4,count=1" \ --machine-type=$INSTANCE_TYPE \ --boot-disk-size=$HDD \ --metadata="install-nvidia-driver=True" \ # --preemptible
Line 2 will list all the active projects you have made. Set that project name as done in line 4. If you have a specific zone then set it in line 10, else leave it as is. In line 13 you’ll set the type of GPU (or CPU if budget is low) you want. There are other higher ones as well. In line 19, set the amount of HDD you want. And that’s it. All the variables are set. All these parameters will be used to create a VM for you. You can put the above code segment into a separate script and just automate this step.
There’s also this setting (line 30) for your VM which is important to know about. The option
preemptible means that your machine will be stopped after 24 hours of continuous running. It can also be preempted (stopped) with a 30 seconds notice at any time due to high load. This option is for beginners as preemptible instances are cheaper and prevents the extra charges if you forget to stop the instance after using it. The Preemptible VM Instances page will give you further details.
After the successful VM creation you can check its status displayed here - https://console.cloud.google.com/compute/. A green tick will be there to indicate the successful creation and running of the instance. You might get an error about reaching your limit of GPUs or quota. To solve this, you’ll need to request the GPU quota from the quotas menu under IAM & Admin. This SE question will help you out - https://serverfault.com/q/887256. You should have the billing verification done before making a request for the quota increase otherwise it’ll be rejected.
Once you have green tick in the dashboard against your VM you’ll be able to SSH into it.
1 2 3 4 5 6 7 8 9 export PROJECT_NAME="gcp-project" echo "Selected project - "$PROJECT_NAME gcloud config set project $PROJECT_NAME export INSTANCE_NAME="cntk-docker-vm" export ZONE="us-west2-b" # budget: "us-west1-b" echo "Logging in..." gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME
For the first time, it’ll generate your SSH key followed by a prompt for a login password. In the subsequent runs, it’ll just ask for the password to log you in.
To start/stop the VM from CLI just follow the given commands.
1 2 3 4 5 6 7 8 9 10 11 12 export PROJECT_NAME="microsoft-ai-2018" echo "Selected project - "$PROJECT_NAME gcloud config set project microsoft-ai-2018 export INSTANCE_NAME="cntk-docker-trial" export ZONE="us-west2-b" # budget: "us-west1-b" echo "Starting..." gcloud compute instances start --zone=$ZONE $INSTANCE_NAME echo "Stopping..." gcloud compute instances stop --zone=$ZONE $INSTANCE_NAME
Docker is like a virtual machine software. It runs containers. Containers are application images having all the libraries, tools, configurations needed to run the application. This saves the developer’s time by automatically setting up the development environment and the dependencies.
At first, I tried the Microsoft made CNTK docker image. I tried installing the NVIDIA Docker image having Python 3.5. Sadly, this gave me the similar error I was getting when I tried installing it on my own laptop. What’s the point of providing a Docker image if you’ll get errors even while/after installing it and you have to deal with the same errors which you’d have faced while installing it yourself.
After wasting a few hours because, I thought, I was doing a mistake while working with docker as it was my first time working with it. I deleted my VM instance and created a new one and followed the same steps, but again the same issue. Then my colleague gave me this DL Docker image link - Deepo. I installed the GPU version and it worked on the first try. I thank the Deepo guy(s) for providing the Docker images.
Instruction to pull the docker image-
Instruction to launch the GPU docker image after installing it is in the following code segment. This code block can also be put into a script to launch the container in one go.
The above instructions and further more (Jupyter, CPU version of the image, customization) is listed in the readme of the Deepo repo.